New M Functionality And Behaviour In Power BI Custom Data Connectors

Over the past few weeks I’ve spent some time playing around with Power BI custom data connectors and while I don’t have anything to share publicly yet (other people are way ahead of me in this respect – see the work of Igor Cotruta, Miguel Escobar and Kasper de Jonge among others) I have learned some interesting things that are worth blogging about.

First of all, the data privacy rules around combining data from different data sources do not apply in custom data connector code. As the docs say here:

Data combination checks do not occur when accessing multiple data sources from within an extension. Since all data source calls made from within the extension inherit the same authorization context, it is assumed they are “safe” to combine. Your extension will always be treated as a single data source when it comes to data combination rules. Users would still receive the regular privacy prompts when combining your source with other M sources.

Those of you who have followed my recent series on this topic, or who have struggled with the Formula.Firewall error, will appreciate how much easier this makes combining data from different sources.

Secondly, you have a lot more flexibility when it comes to different types of authentication for web services. As I showed in my session on web services and M at the Data Insights Summit, there are a lot of limitations when it comes to working with web services in Power BI or Excel. Within a custom data connector, however, you can connect to web services that use OAuth for authentication, you can make POST requests to web services that require authentication and you can pass a web API key from the credentials store through an HTTP custom header and not just through a query parameter – none of which are possible in Power BI or Excel.

I’m sure there are a lot of other useful bits of functionality or behaviour that are only available in custom data connectors – I know I’ve only just begun to learn what’s possible. Even with what I’ve listed here, though, I get the feeling that there will be a lot of cases where you will have no choice but to build a custom data connector just to be able to access certain data sources, even if you only need to create a single report. There may also be cases where it’s preferable to build a custom data connector rather than embed lots of complex M code in a Power BI report or Excel workbook, perhaps to make code portability easier. It’s a bit of a pain to have to have Visual Studio and the SDK installed in order to do this, but building a custom data connector is fairly easy if you already know M and the development experience in Visual Studio (with intellisense!) is much better than in the Advanced Query Editor window.

Data Privacy Settings In Power BI/Power Query, Part 3: The Formula.Firewall Error

In the first two parts of this series (see here and here) I showed how Power BI/Power Query/Excel Get & Transform’s data privacy settings can influence whether query folding takes place or even whether a query is able to run or not. In this post I’m going to talk about the situations where, whatever data privacy level you use, the query will not run at all and you get the infamous Formula.Firewall error.

I’ll admit I don’t understand this particular topic perfectly (I’m not sure anyone outside the Power Query dev team does) so what I will do is explain what I do know, demonstrate a few scenarios where the error occurs and show how to work around it.

Assume you have the two data sources described in my previous posts: an Excel workbook that contains just a single day name, and the DimDate table in SQL Server that can be filtered by the day name from Excel. Let’s also assume that both data sources have their data privacy levels set to Public. The following query, called FilterDay, loads the data from Excel and returns a text value containing the day name:

let
    Source = 
	Excel.Workbook(
		File.Contents("C:\FilterParameter.xlsx"), 
	null, true),
    FilterDay_Table = 
	Source{[Item="FilterDay",Kind="Table"]}[Data],
    ChangedType = 
	Table.TransformColumnTypes(
		FilterDay_Table,
		{{"Parameter", type text}}
	),
    Output = 
	ChangedType{0}[#"Parameter"]
in
    Output

image

Now, look at the following query:

let
    Source = 
	Sql.Database(
		"localhost", 
		"adventure works dw",
		[Query="select DateKey, EnglishDayNameOfWeek 
		from DimDate"]),
    FilteredRows = 
	Table.SelectRows(Source, 
		each ([EnglishDayNameOfWeek] = FilterDay)
	)
in
    FilteredRows

It filters the contents of the DimDate table and only returns the rows where the EnglishDayNameOfWeek column matches the day name returned by the FilterDay query. Notice that there are two steps in the query, Source (which runs a SQL query) and FilteredRows (which does the filtering). Here’s the output:

image

As you can see from the screenshot, the query runs. In fact it runs whatever data privacy settings you have set on both the data sources, although it’s worth pointing out that if you use your own SQL in an M query (as I do in this case) this stops query folding in all subsequent steps, as described here.

Now take a look at the following version of the query:

let
    Source = 
	Table.SelectRows(
		Sql.Database(
			"localhost", 
			"adventure works dw",
			[Query="select DateKey, 
				EnglishDayNameOfWeek 
				from DimDate"]
		), 
		each ([EnglishDayNameOfWeek] = FilterDay)
	)
in
    Source

The important difference here is that there is now one step in this query instead of two: the query and the filtering take place in the same step. Even more importantly, regardless of the data privacy settings, the query fails with the error:

Formula.Firewall: Query ‘DimDate With Native Query Single Step Fails’ (step ‘Source’) references other queries or steps, so it may not directly access a data source. Please rebuild this data combination.

image

The problem here is that the Power Query engine is not allowed to access two different data sources originating from different queries in the same step – as far as I understand it this is because it makes it too hard for the engine to work out whether a step connects to a data source or not, and so which data privacy rules should be applied.

At this point you might think that it’s straightforward to break your logic up into separate steps, as in the first example above. However there are some situations where it’s not so easy to work around the problem. For example, consider the following query:

let
    Source = 
	Sql.Database(
		"localhost", 
		"adventure works dw",
		[Query="
		 select DateKey, EnglishDayNameOfWeek 
		 from DimDate 
		 where 
		 EnglishDayNameOfWeek='" & FilterDay & "'" 
		]
	)
in
    Source

In this example I’m dynamically generating the SQL query that is being run and passing the name of the day to filter by into the WHERE clause. In the two previous examples the query that was run had no WHERE clause and the filtering on day name took place inside Power BI – in this case the filtering is happening inside the query, so in order to generate the WHERE clause I have to refer to the value that the FilterDay query returns in the same step. Therefore, this query also gives the same Formula.Firewall error seen above.

How can you work around this? Well, the following version of the query that attempts to reference FilterDay in a separate step doesn’t work either:

let
    DayAsStep = FilterDay,
    Source = 
	Sql.Database(
		"localhost", 
		"adventure works dw",
		[Query="
		 select DateKey, EnglishDayNameOfWeek 
		 from DimDate 
		 where 
		 EnglishDayNameOfWeek='" & DayAsStep & "'" 
		]
	)
in
    Source

 

Luckily, it turns out that if you use the Value.NativeQuery() function to run your query instead you can avoid the error. As I showed here, you can use this function to pass parameters to SQL queries. If you generate the record containing the parameters for the query as a separate step (called ParamRecord here), like so:

let
    Source = Sql.Database("localhost", "adventure works dw"),
    ParamRecord = [FilterParameter=FilterDay],
    Query = Value.NativeQuery(
                Source, 
                "select DateKey, EnglishDayNameOfWeek 
		from DimDate 
		where 
		EnglishDayNameOfWeek=@FilterParameter",
                ParamRecord)
in
    Query

Then the query runs successfully.

There is another way to avoid the error. In all the examples above I have two queries: one to get data from Excel, one to get filtered data from SQL Server. If these two queries are combined into a single query, it doesn’t matter if data from different data sources is accessed in the same step. So, for example, unlike all of the queries above the following query does not reference any other queries; instead it gets the day name from the Excel workbook in the ExcelSource step and then runs the dynamic SQL query in the SQLSource step, and runs successfully:

let
    ExcelSource = 
	Excel.Workbook(
		File.Contents("C:\FilterParameter.xlsx")
	, null, true),
    FilterDay_Table = 
	ExcelSource{[Item="FilterDay",Kind="Table"]}[Data],
    ChangedType = 
	Table.TransformColumnTypes(FilterDay_Table,
		{{"Parameter", type text}}),
    FilterDayStep = 
	ChangedType{0}[#"Parameter"],
    SQLSource = Sql.Database(
	"localhost", 
	"adventure works dw",
	[Query="
		select DateKey, EnglishDayNameOfWeek 
		from DimDate 
		where 
		EnglishDayNameOfWeek='" 
		& FilterDayStep & 
		"'" ])
in
    SQLSource

Clearly the M engine doesn’t get confused about accessing data from different sources in the same step if those data sources are created in the same query.

Of course you can avoid the Formula.Firewall error and make query folding happen as often as possible by turning off data privacy checks completely in the Options dialog. This will be the subject of the next post in this series.

Pre-Conference Training Day On Data Loading With Power BI/Excel Get & Transform/Power Query At PASS Summit

I’m pleased to announce that I’m running a pre-conference training day on “Loading and transforming data with Power BI and Power Query” at this year’s PASS Summit in Seattle. For the full agenda and details on how to register, see:
http://www.pass.org/summit/2017/Sessions/Details.aspx?sid=66485

This full-day session will teach you everything you need to know about loading data into Power BI or Excel; it’s specifically focussed on self-service BI scenarios, but if you want to learn M for SSAS 2017 you might find some of what I cover useful too. Topics covered include using the UI to build queries; using parameters and functions; the M language; data privacy settings; and creating custom data connectors. I hope to see you there!

Query Folding And Writing Your Own SQL Queries In Power Query/Power BI/Excel Get & Transform

When you connect to a relational database like SQL Server in Power BI/Power Query/Excel Get & Transform you have two choices about how to get the data you need:

  1. You can choose a table from the database and then either use the Query Editor UI or write some M to get the data you need from that table. For example, you might choose a table that has one row for every product that your company sells and then, using the UI, filter that down to only the products that are red.
  2. You can enter a SQL query that gets the data you need.

Something that you might not realise is that if you choose the second option and then subsequently use the UI to apply even more filtering or transformation, then those subsequent steps will not be able to make use of query folding.

As an example of option (1), imagine you connect to the DimProduct table in the SQL Server Adventure Works DW database like so:

image

image

The following M query is generated by the Query Editor when you filter the table to only return the red products and remove all columns except EnglishProductName. That’s very easy to do so I won’t describe it, but here’s the M:

let
    Source = 
	Sql.Databases("localhost"),
    #"Adventure Works DW" = 
	Source{
		[Name="Adventure Works DW"]
	}[Data],
    dbo_DimProduct = 
	#"Adventure Works DW"{
		[Schema="dbo",Item="DimProduct"]
	}[Data],
    #"Filtered Rows" = 
	Table.SelectRows(
		dbo_DimProduct, 
		each ([Color] = "Red")
	),
    #"Removed Other Columns" = 
	Table.SelectColumns(
		#"Filtered Rows",
		{"EnglishProductName"}
	)
in
    #"Removed Other Columns"

image

Using the View Native Query option, you can find out that the following SQL is generated to get this data:

select [_].[EnglishProductName]
from [dbo].[DimProduct] as [_]
where [_].[Color] = 'Red'

image

image

It’s pretty clear that query folding is taking place for the filter on “red” and for the selection of the required column.

However, if you enter the following SQL query when you first connect to the database:

select * from dimproduct

image

And then, after that, filter the table and remove columns in exactly the same way, you get the following M query:

let
    Source = 
	Sql.Database(
		"localhost", 
		"Adventure Works DW", 
		[Query="select * from dimproduct"]),
    #"Filtered Rows" = 
	Table.SelectRows(
		Source, 
		each ([Color] = "Red")),
    #"Removed Other Columns" = 
	Table.SelectColumns(
		#"Filtered Rows",
		{"EnglishProductName"})
in
    #"Removed Other Columns"

If you now try to use the View Native Query option on either the Removed Other Columns or Filtered Rows steps you’ll find it’s greyed out, indicating query folding is not taking place for those steps:

image

The query you enter is run and then Power BI applies the filter and selects the column itself in the resultset that the SQL query returns.

This obviously has big implications for performance. The lesson here is that if you’re going to write your own SQL query in the Query Editor, you should make sure it does all of the expensive filters and transformations you need because anything else you do in the query will happen outside the database in Power BI or Excel.

Power BI Custom Data Connectors

Amid all the other announcements at Microsoft Build today came one I’ve been waiting for a long time: we can now build our own custom data connectors for Power BI. Here’s the announcement on the Power BI blog:

https://powerbi.microsoft.com/en-us/blog/data-connectors-developer-preview/

…and here are the docs:

https://github.com/Microsoft/DataConnectors

Although it might not sound that exciting, I think this is one of the biggest things that’s happened to Power BI in a long time. Microsoft’s most successful products are always the ones that are platforms for its partners to build their own solutions on top of, and Power BI is no exception. Custom visuals have made an important contribution to the success of Power BI as a whole, allowing third parties to build visuals that Microsoft either didn’t have the resources to build itself or didn’t realise people wanted. Similarly, custom data connectors will only increase the already impressive number of data sources that Power BI can connect to. My guess is that a lot of SaaS and data providers will want to build their own custom data connectors, and that the number of custom data connectors available is going to explode. Why go to all the trouble of trying to build BI and reporting features into your product when it’s quicker and easier to send the data to Power BI and your users get a much better experience as a result?

This doesn’t just affect Power BI though. Since the new Get & Transform functionality for loading data into Excel (see here) is based on the same Power Query technology that is used by Power BI, these custom data connectors will at some point work with Excel too. Although it will be a while before most Excel users upgrade to a version that supports custom data connectors, when that happens the sheer size of the Excel market means that there will be even more demand for custom connectors and this will in turn strengthen the Power BI ecosystem. Don’t forget that Analysis Services Tabular 2017 also now uses Power Query for loading data, so this will benefit corporate BI users too.

From a purely technical point of view today’s announcement included some welcome new capabilities for M coders like me. These include Visual Studio integration (intellisense for M!) and the ability to connect to web services and use OAuth for authentication, something that wasn’t really possible before. It’s also interesting that you can create custom data connectors that enable DirectQuery connections from Power BI, and that you aren’t limited to importing data.

I can’t wait to start using all these features and I’m sure they will be the subject of many future blog posts. Also, if you’re looking to hire someone with a good understanding of M to build a custom data connector, I’m your guy. You can find my contact details at http://www.crossjoin.co.uk/

DAX Functions, DirectQuery And Unrestricted Measures

If you use Power BI to connect to a data source such as SQL Server in DirectQuery mode, you’ll find that by default you cannot use the complete range of DAX functions inside your calculations. For example, here’s a screenshot from Power BI Desktop in DirectQuery mode with a measure where I’m trying to use the RankX() function:

image

Whenever you get an error message like “Function ‘RANKX’ is not supported in DirectQuery mode” you know you’re trying to use a DAX function that isn’t allowed by default in DirectQuery mode.

Why does this limitation exist? Well, as the documentation points out here:

https://powerbi.microsoft.com/en-us/documentation/powerbi-desktop-use-directquery/#limitations-of-directquery

…this is done for performance reasons. In Power BI, every time you interact with a report (by opening it, clicking on a slicer etc) a series of DAX queries is sent back to the Data Model to get the data needed by the report. In DirectQuery mode, all of these DAX queries – including any calculations – are translated into queries that are sent back to the data source. If that data source is SQL Server, then Power BI generates SQL queries. Not everything that you can do in a DAX calculation can be translated back into efficient SQL, so to prevent you from accidentally building calculations that will perform badly in DirectQuery mode Power BI tries to prevent you from using any DAX functions that it thinks it can’t generate efficient SQL for.

You can remove this restriction by clicking File/Options and Settings/Options to open the Options dialog and then going to the DirectQuery pane and selecting “Allow unrestricted measures in DirectQuery mode”:

image

There is no documented list of functions that work in DirectQuery mode (probably because it might change with each release) but there is a way to get one. Using SQL Server Management Studio it is possible to connect to Power BI Desktop and run DMV queries against the Power BI Data Model. If you query the MDSCHEMA_FUNCTIONS DMV like so:

select * from $SYSTEM.MDSCHEMA_FUNCTIONS

You’ll see that it returns a list of all of the DAX and MDX functions understood by Power BI; the last column, called DIRECTQUERY_PUSHABLE contains information used by Power BI to decide whether a calculation is available in DirectQuery mode or not:

image

[At the time of writing this query doesn’t work in DAX Studio for some reason, I think because of the data returned in the PARAMETERINFO column, but if you try to restrict the rows or columns returned by the DMV in any way the DIRECTQUERY_PUSHABLE column only returns empty values. This is why I say you have to use SQL Server Management Studio]

For the DAX functions in the list, if DIRECTQUERY_PUSHABLE is 0 then it is not available in DirectQuery mode by default. If it returns 1 then it is available in measures but not calculated columns; if it returns 3 it is available in both measures and calculated columns (I guess 2 means that it is only available in calculated columns, but there are no functions with this property value right now). However this is not a definitive guide to whether a function is actually ‘pushable’ or not: some functions could be pushable in some scenarios, or with certain combinations of parameters, and not others and the UI may perform additional checks to decide whether a function can be used.

Thanks to Jeffrey Wang and Marius Dumitru for answering my questions about this.

By the way, if you’re reading this, you might also be interested in this post I wrote a while back on the functionality available in the Query Editor in DirectQuery mode:

https://blog.crossjoin.co.uk/2015/11/11/power-bi-directquery-mode-not-just-ssas-directquery-v2-0/

UPDATE – as Marco points out in his comments below, there *is* documentation on which functions are supported in DirectQuery mode now. I just didn’t know about it…

Using Text.BetweenDelimiters() To Extract URLs From A Web Page In Power BI/Power Query M

The Add Column By Example functionality that appeared in the April 2017 release of Power BI Desktop is, I have to say, very cool: I’ve used it quite a bit in the last few weeks and it really does work well. One thing I noticed while using it is that three new functions have been added to the M language to support this feature:

  • Text.BetweenDelimiters() which takes a text value and extracts the part that appears between two given delimiter values
  • Text.BeforeDelimiter() which takes a text value and extracts the part that appears before a given delimiter value
  • Text.AfterDelimiter() which takes a text value and extracts the part that appears after a given delimiter value

The functions themselves are quite straightforward and the online documentation has some good examples of how to use them. To save you the click here’s an extra example – the expression:

Text.BetweenDelimiters("Hello *world!??", "*", "!")

…returns the text “world”:

image

As it happens last week I received an email from a reader who wanted to know if it was possible to extract all the links from the href attributes in the source of a web page using M, and I realised that Text.BetweenDelimiters() would be very useful for doing this. I wrote the following M function to demonstrate:

(SourceURL as text, AttributeDelimiter as text) =>
let
	//Get HTML source
    Source = Text.FromBinary(Web.Contents(SourceURL)),
	//Function to find each link
    GetLink = (Counter as number) =>
                    let
                        CurrentLink = 
		Text.BetweenDelimiters(
			Source, 
			"href=" & AttributeDelimiter, 
			AttributeDelimiter,
			Counter
		)
                    in
                        if CurrentLink="" 
		then 
			{} 
		else 
			List.Combine({
				{CurrentLink}, 
				@GetLink(Counter+1)}
			),
	//Call function
    Output = GetLink(0)
in
    Output

A few things to note:

  • I’m using a combination of Text.FromBinary() and Web.Contents() to get the HTML source for the web page whose links we’re extracting
  • Since HTML allows the use of single and double quotes for attributes, I’ve added a parameter to my function called AttributeDelimiter to allow either to be passed in
  • Text.BetweenDelimiters only extracts one piece of text at a time, but you can specify which occurrence of the start delimiter it uses. I therefore used recursion to extract the contents of every href attribute in the HTML: I declare a function called GetLink, and from within that function I can make a recursive call by putting an @ before the function name as in line 22 above. It would probably be better to use List.Generate() instead of recursion here though.

Assuming the query that returns this function is called GetAllLinks

image

…then it can be called in a new query like so:

GetAllLinks(
	"https://msdn.microsoft.com/en-us/library/mt798303.aspx", 
	""""
)

One other thing to point out is how, in order to pass a double quote character to the function as text, since text has itself to be enclosed in double quotes I need to use four double quotes: “”””

The output of this query is a list containing all of the links from the href attributes on the page that are enclosed in double quotes:

image

I guess this could be taken even further to create a function that crawls a series of web pages and returns the links in all of them, then uses the Force Directed Graph custom visual or better still NodeXL in Excel to show which pages link to each other. I’ll leave that to someone else to do though…!

You can download a pbix file with all of the examples in this post here.