If you’re looking for classroom-based training on Microsoft BI, data science or SQL Server then check out the list of courses I have coming up at Technitrain this autumn:
Introduction to Power BI, taught by me, September 25th-27th – a three day course covering the basics of Power BI suitable for BI pros and business analysts with no previous experience of the product.
Database DevOps, taught by Alex Yates, September 25th-27th – a course for DBAs, developers or anyone who plays a role in writing, testing or deploying changes to SQL Server.
Practical Data Science with Cortana Intelligence, taught by Rafal Lukawiecki, November 6th-10th – an introductory course for budding data scientists covering theory and how to apply it to Microsoft tools including Azure Machine Learning and SQL Server 2016 R Services.
Mastering DAX, taught by Alberto Ferrari, November 27th-29th – a three day introductory course that will teach you how to write DAX calculations and queries in Power BI, Power Pivot and Analysis Services Tabular.
Optimising DAX, taught by Alberto Ferrari, November 30th-1st December – learn about performance tuning DAX from one of the leading experts in the field!
In the first two parts of this series (see here and here) I showed how Power BI/Power Query/Excel Get & Transform’s data privacy settings can influence whether query folding takes place or even whether a query is able to run or not. In this post I’m going to talk about the situations where, whatever data privacy level you use, the query will not run at all and you get the infamous Formula.Firewall error.
I’ll admit I don’t understand this particular topic perfectly (I’m not sure anyone outside the Power Query dev team does) so what I will do is explain what I do know, demonstrate a few scenarios where the error occurs and show how to work around it.
Assume you have the two data sources described in my previous posts: an Excel workbook that contains just a single day name, and the DimDate table in SQL Server that can be filtered by the day name from Excel. Let’s also assume that both data sources have their data privacy levels set to Public. The following query, called FilterDay, loads the data from Excel and returns a text value containing the day name:
let
Source =
Excel.Workbook(
File.Contents("C:\FilterParameter.xlsx"),
null, true),
FilterDay_Table =
Source{[Item="FilterDay",Kind="Table"]}[Data],
ChangedType =
Table.TransformColumnTypes(
FilterDay_Table,
{{"Parameter", type text}}
),
Output =
ChangedType{0}[#"Parameter"]
in
Output
Now, look at the following query:
let
Source =
Sql.Database(
"localhost",
"adventure works dw",
[Query="select DateKey, EnglishDayNameOfWeek
from DimDate"]),
FilteredRows =
Table.SelectRows(Source,
each ([EnglishDayNameOfWeek] = FilterDay)
)
in
FilteredRows
It filters the contents of the DimDate table and only returns the rows where the EnglishDayNameOfWeek column matches the day name returned by the FilterDay query. Notice that there are two steps in the query, Source (which runs a SQL query) and FilteredRows (which does the filtering). Here’s the output:
As you can see from the screenshot, the query runs. In fact it runs whatever data privacy settings you have set on both the data sources, although it’s worth pointing out that if you use your own SQL in an M query (as I do in this case) this stops query folding in all subsequent steps, as described here.
Now take a look at the following version of the query:
let
Source =
Table.SelectRows(
Sql.Database(
"localhost",
"adventure works dw",
[Query="select DateKey,
EnglishDayNameOfWeek
from DimDate"]
),
each ([EnglishDayNameOfWeek] = FilterDay)
)
in
Source
The important difference here is that there is now one step in this query instead of two: the query and the filtering take place in the same step. Even more importantly, regardless of the data privacy settings, the query fails with the error:
Formula.Firewall: Query ‘DimDate With Native Query Single Step Fails’ (step ‘Source’) references other queries or steps, so it may not directly access a data source. Please rebuild this data combination.
The problem here is that the Power Query engine is not allowed to access two different data sources originating from different queries in the same step – as far as I understand it this is because it makes it too hard for the engine to work out whether a step connects to a data source or not, and so which data privacy rules should be applied.
At this point you might think that it’s straightforward to break your logic up into separate steps, as in the first example above. However there are some situations where it’s not so easy to work around the problem. For example, consider the following query:
let
Source =
Sql.Database(
"localhost",
"adventure works dw",
[Query="
select DateKey, EnglishDayNameOfWeek
from DimDate
where
EnglishDayNameOfWeek='" & FilterDay & "'"
]
)
in
Source
In this example I’m dynamically generating the SQL query that is being run and passing the name of the day to filter by into the WHERE clause. In the two previous examples the query that was run had no WHERE clause and the filtering on day name took place inside Power BI – in this case the filtering is happening inside the query, so in order to generate the WHERE clause I have to refer to the value that the FilterDay query returns in the same step. Therefore, this query also gives the same Formula.Firewall error seen above.
How can you work around this? Well, the following version of the query that attempts to reference FilterDay in a separate step doesn’t work either:
let
DayAsStep = FilterDay,
Source =
Sql.Database(
"localhost",
"adventure works dw",
[Query="
select DateKey, EnglishDayNameOfWeek
from DimDate
where
EnglishDayNameOfWeek='" & DayAsStep & "'"
]
)
in
Source
Luckily, it turns out that if you use the Value.NativeQuery() function to run your query instead you can avoid the error. As I showed here, you can use this function to pass parameters to SQL queries. If you generate the record containing the parameters for the query as a separate step (called ParamRecord here), like so:
let
Source = Sql.Database("localhost", "adventure works dw"),
ParamRecord = [FilterParameter=FilterDay],
Query = Value.NativeQuery(
Source,
"select DateKey, EnglishDayNameOfWeek
from DimDate
where
EnglishDayNameOfWeek=@FilterParameter",
ParamRecord)
in
Query
Then the query runs successfully.
There is another way to avoid the error. In all the examples above I have two queries: one to get data from Excel, one to get filtered data from SQL Server. If these two queries are combined into a single query, it doesn’t matter if data from different data sources is accessed in the same step. So, for example, unlike all of the queries above the following query does not reference any other queries; instead it gets the day name from the Excel workbook in the ExcelSource step and then runs the dynamic SQL query in the SQLSource step, and runs successfully:
let
ExcelSource =
Excel.Workbook(
File.Contents("C:\FilterParameter.xlsx")
, null, true),
FilterDay_Table =
ExcelSource{[Item="FilterDay",Kind="Table"]}[Data],
ChangedType =
Table.TransformColumnTypes(FilterDay_Table,
{{"Parameter", type text}}),
FilterDayStep =
ChangedType{0}[#"Parameter"],
SQLSource = Sql.Database(
"localhost",
"adventure works dw",
[Query="
select DateKey, EnglishDayNameOfWeek
from DimDate
where
EnglishDayNameOfWeek='"
& FilterDayStep &
"'" ])
in
SQLSource
Clearly the M engine doesn’t get confused about accessing data from different sources in the same step if those data sources are created in the same query.
Of course you can avoid the Formula.Firewall error and make query folding happen as often as possible by turning off data privacy checks completely in the Options dialog. This will be the subject of the next post in this series.
Melissa has already blogged about the white paper here.
Topics covered include the different ways that Power BI can be deployed (as a self-service BI tool or as a corporate BI tool); licensing; preparing data for use in Power BI; choosing a data storage mode (import vs Live connections to SSAS vs DirectQuery); data refresh and the on-premises gateway; best practices for report development; collaboration and sharing (covering apps and content packs); options for consuming reports and data published to Power BI; and security, compliance and administration. If that sounds like a lot, it is: it’s 105 pages long!
It was a real pleasure working with Melissa on this, and I’d also like to thank Meagan Longoria for reviewing it. I’m also extremely grateful to Adam Wilson and an army of Microsoft employees for providing information, answering our questions and correcting our mistakes.
This full-day session will teach you everything you need to know about loading data into Power BI or Excel; it’s specifically focussed on self-service BI scenarios, but if you want to learn M for SSAS 2017 you might find some of what I cover useful too. Topics covered include using the UI to build queries; using parameters and functions; the M language; data privacy settings; and creating custom data connectors. I hope to see you there!
I can’t remember ever attending a conference where there was so much new functionality revealed. I’m not talking about things like Power BI Premium – that’s last month’s news – I’m talking about lots and lots of really cool new features that I had not seen before, so many that I feel completely overwhelmed by them all. There are a few blog posts out there that try to summarise the announcements (apart from the post on the official Power BI blog, check out Matt Allington and Dustin Ryan’s posts) but really your only option is to watch as many of the session videos as you can! I thought I would call out some videos about the new features for you to start off with:
The keynote on the first day was pretty amazing (Amir Netz’s demo should be watched in full) with some highlights being:
The PowerApp custom visual will not only allow you to embed a PowerApp directly in a Power BI report allowing users to write values back to the underlying data source, similar to writeback in SSAS
My star rating measure from this blog post has been turned into a Quick Measure! My name is in Power BI! I am so chuffed! The important thing here is that this is the first example of a calculation submitted to the Quick Measures Gallery making it into the product.
Drillthrough pages and bookmarks will revolutionise how we can navigate through a Power BI report, and turn Power BI into a true data storytelling tool, and are really exciting.
What-if analysis will allow you to move a slider and change values that feed into measures
New AI features to explain what’s happening in your data, the evolution of Quick Insights
The session on the Visio custom visual for Power BI had a lot of detail on how Visio/Power BI integration will work, and was very impressive. You can sign up for the preview here.
Power BI/PowerApps/Flow integration, with Power BI content appearing in Power Apps and vice versa is something I am going to watch with interest.
One major announcement (at least for me) that slipped out in the session on the Common Data Model is that there’s going to be a standalone, SaaS version of Power Query available that will be able to load data into the Common Data Model
Azure Analysis Services will soon have a much better web interface that will allow you to create simple models, and also to import models from Power BI .pbix files. Christian Wade’s session on Azure Analysis Services also has a great extended demo showing off new SSAS Tabular features, including one showing using BISM Normalizer to merge parts of a mode in a .pbix file into a Tabular model.
This is by no means comprehensive. I haven’t been able to watch all the videos yet either, so there may be other important new features in other sessions that I completely missed.
When you connect to a relational database like SQL Server in Power BI/Power Query/Excel Get & Transform you have two choices about how to get the data you need:
You can choose a table from the database and then either use the Query Editor UI or write some M to get the data you need from that table. For example, you might choose a table that has one row for every product that your company sells and then, using the UI, filter that down to only the products that are red.
You can enter a SQL query that gets the data you need.
Something that you might not realise is that if you choose the second option and then subsequently use the UI to apply even more filtering or transformation, then those subsequent steps will not be able to make use of query folding.
As an example of option (1), imagine you connect to the DimProduct table in the SQL Server Adventure Works DW database like so:
The following M query is generated by the Query Editor when you filter the table to only return the red products and remove all columns except EnglishProductName. That’s very easy to do so I won’t describe it, but here’s the M:
let
Source =
Sql.Databases("localhost"),
#"Adventure Works DW" =
Source{
[Name="Adventure Works DW"]
}[Data],
dbo_DimProduct =
#"Adventure Works DW"{
[Schema="dbo",Item="DimProduct"]
}[Data],
#"Filtered Rows" =
Table.SelectRows(
dbo_DimProduct,
each ([Color] = "Red")
),
#"Removed Other Columns" =
Table.SelectColumns(
#"Filtered Rows",
{"EnglishProductName"}
)
in
#"Removed Other Columns"
Using the View Native Query option, you can find out that the following SQL is generated to get this data:
select [_].[EnglishProductName]
from [dbo].[DimProduct] as [_]
where [_].[Color] = 'Red'
It’s pretty clear that query folding is taking place for the filter on “red” and for the selection of the required column.
However, if you enter the following SQL query when you first connect to the database:
select * from dimproduct
And then, after that, filter the table and remove columns in exactly the same way, you get the following M query:
let
Source =
Sql.Database(
"localhost",
"Adventure Works DW",
[Query="select * from dimproduct"]),
#"Filtered Rows" =
Table.SelectRows(
Source,
each ([Color] = "Red")),
#"Removed Other Columns" =
Table.SelectColumns(
#"Filtered Rows",
{"EnglishProductName"})
in
#"Removed Other Columns"
If you now try to use the View Native Query option on either the Removed Other Columns or Filtered Rows steps you’ll find it’s greyed out, indicating query folding is not taking place for those steps:
The query you enter is run and then Power BI applies the filter and selects the column itself in the resultset that the SQL query returns.
This obviously has big implications for performance. The lesson here is that if you’re going to write your own SQL query in the Query Editor, you should make sure it does all of the expensive filters and transformations you need because anything else you do in the query will happen outside the database in Power BI or Excel.
A really useful new feature was added to Get & Transform (the functionality previously known as Power Query) in the latest updates for the Office 365 click-to-run version of Excel 2016: the ability to export and import ODC files containing M queries. This makes sense given that Get & Transform is the new default way for loading data into Excel, but it’s nonetheless very welcome.
It’s very straightforward to use, and all the details are available in the section on “How do I get data from recently used sources, Office Database Connection (ODC) files, other workbook connections, or tables?” halfway down this article:
You just need to right-click on your query in the Queries & Connections pane to export it:
…and after that you can import the ODC file in the normal way when you want to create a new connection.
As always, I want more though. Some ideas/requests:
Power BI Desktop should be able to import and export ODC files in this format too: it would make it much easier to reuse queries. Vote here if you agree.
Power BI should have a central repository for Power BI and Excel users to store these ODC files for easy sharing and reuse by report developers. It’s a shame that the Azure Data Catalog integration with Excel/Power BI hasn’t had any love recently because that would have been the obvious place to create such a repository.
We also desperately need some kind of source control for M queries inside Excel and Power BI (not quite the same requirement as the previous point). I know a lot of people are doing this manually with services like Git, but I would love to be able to check my code in and out directly from the Query Editor.
I’ve also noticed that the old “Load To” dialog (that I found incredibly confusing) has been replaced by the standard Excel Import Data dialog in this release – another improvement. After you hit Close & Load in the Query Editor, this is what you now see: