White Paper On “Planning A Power BI Enterprise Deployment”

I’m pleased to announce that a white paper I co-authored with Melissa Coates on “Planning a Power BI enterprise deployment” has now been published. You can download it from the Power BI white papers site here: https://aka.ms/pbienterprisedeploy

Melissa has already blogged about the white paper here.

Topics covered include the different ways that Power BI can be deployed (as a self-service BI tool or as a corporate BI tool); licensing; preparing data for use in Power BI; choosing a data storage mode (import vs Live connections to SSAS vs DirectQuery); data refresh and the on-premises gateway; best practices for report development; collaboration and sharing (covering apps and content packs); options for consuming reports and data published to Power BI; and security, compliance and administration. If that sounds like a lot, it is: it’s 105 pages long!

It was a real pleasure working with Melissa on this, and I’d also like to thank Meagan Longoria for reviewing it. I’m also extremely grateful to Adam Wilson and an army of Microsoft employees for providing information, answering our questions and correcting our mistakes.

Pre-Conference Training Day On Data Loading With Power BI/Excel Get & Transform/Power Query At PASS Summit

I’m pleased to announce that I’m running a pre-conference training day on “Loading and transforming data with Power BI and Power Query” at this year’s PASS Summit in Seattle. For the full agenda and details on how to register, see:
http://www.pass.org/summit/2017/Sessions/Details.aspx?sid=66485

This full-day session will teach you everything you need to know about loading data into Power BI or Excel; it’s specifically focussed on self-service BI scenarios, but if you want to learn M for SSAS 2017 you might find some of what I cover useful too. Topics covered include using the UI to build queries; using parameters and functions; the M language; data privacy settings; and creating custom data connectors. I hope to see you there!

Microsoft Data Insights Summit Videos Published

The Microsoft Data Insights Summit finished today, and the videos of the sessions have already (!!) been published online here:

https://www.youtube.com/user/mspowerbi/playlists?sort=dd&view=50&shelf_id=9

I can’t remember ever attending a conference where there was so much new functionality revealed. I’m not talking about things like Power BI Premium – that’s last month’s news – I’m talking about lots and lots of really cool new features that I had not seen before, so many that I feel completely overwhelmed by them all. There are a few blog posts out there that try to summarise the announcements (apart from the post on the official Power BI blog, check out Matt Allington and Dustin Ryan’s posts) but really your only option is to watch as many of the session videos as you can! I thought I would call out some videos about the new features for you to start off with:

  • The keynote on the first day was pretty amazing (Amir Netz’s demo should be watched in full) with some highlights being:
  • The session on the Visio custom visual for Power BI had a lot of detail on how Visio/Power BI integration will work, and was very impressive. You can sign up for the preview here.
  • Power BI/PowerApps/Flow integration, with Power BI content appearing in Power Apps and vice versa is something I am going to watch with interest.
  • One major announcement (at least for me) that slipped out in the session on the Common Data Model is that there’s going to be a standalone, SaaS version of Power Query available that will be able to load data into the Common Data Model
  • Matt Masson’s session has a lot of information on building custom data connectors as well as the announcement that M is officially called M!
  • Azure Analysis Services will soon have a much better web interface that will allow you to create simple models, and also to import models from Power BI .pbix files. Christian Wade’s session on Azure Analysis Services also has a great extended demo showing off new SSAS Tabular features, including one showing using BISM Normalizer to merge parts of a mode in a .pbix file into a Tabular model.

This is by no means comprehensive. I haven’t been able to watch all the videos yet either, so there may be other important new features in other sessions that I completely missed.

Last of all, although they don’t contain any new announcements, you can also watch my two M-themed sessions from the conference: Working With Parameters And Functions and Working With Web Services in Power BI/Power Query/Excel Get&Transform.

Query Folding And Writing Your Own SQL Queries In Power Query/Power BI/Excel Get & Transform

When you connect to a relational database like SQL Server in Power BI/Power Query/Excel Get & Transform you have two choices about how to get the data you need:

  1. You can choose a table from the database and then either use the Query Editor UI or write some M to get the data you need from that table. For example, you might choose a table that has one row for every product that your company sells and then, using the UI, filter that down to only the products that are red.
  2. You can enter a SQL query that gets the data you need.

Something that you might not realise is that if you choose the second option and then subsequently use the UI to apply even more filtering or transformation, then those subsequent steps will not be able to make use of query folding.

As an example of option (1), imagine you connect to the DimProduct table in the SQL Server Adventure Works DW database like so:

image

image

The following M query is generated by the Query Editor when you filter the table to only return the red products and remove all columns except EnglishProductName. That’s very easy to do so I won’t describe it, but here’s the M:

let
    Source = 
	Sql.Databases("localhost"),
    #"Adventure Works DW" = 
	Source{
		[Name="Adventure Works DW"]
	}[Data],
    dbo_DimProduct = 
	#"Adventure Works DW"{
		[Schema="dbo",Item="DimProduct"]
	}[Data],
    #"Filtered Rows" = 
	Table.SelectRows(
		dbo_DimProduct, 
		each ([Color] = "Red")
	),
    #"Removed Other Columns" = 
	Table.SelectColumns(
		#"Filtered Rows",
		{"EnglishProductName"}
	)
in
    #"Removed Other Columns"

image

Using the View Native Query option, you can find out that the following SQL is generated to get this data:

select [_].[EnglishProductName]
from [dbo].[DimProduct] as [_]
where [_].[Color] = 'Red'

image

image

It’s pretty clear that query folding is taking place for the filter on “red” and for the selection of the required column.

However, if you enter the following SQL query when you first connect to the database:

select * from dimproduct

image

And then, after that, filter the table and remove columns in exactly the same way, you get the following M query:

let
    Source = 
	Sql.Database(
		"localhost", 
		"Adventure Works DW", 
		[Query="select * from dimproduct"]),
    #"Filtered Rows" = 
	Table.SelectRows(
		Source, 
		each ([Color] = "Red")),
    #"Removed Other Columns" = 
	Table.SelectColumns(
		#"Filtered Rows",
		{"EnglishProductName"})
in
    #"Removed Other Columns"

If you now try to use the View Native Query option on either the Removed Other Columns or Filtered Rows steps you’ll find it’s greyed out, indicating query folding is not taking place for those steps:

image

The query you enter is run and then Power BI applies the filter and selects the column itself in the resultset that the SQL query returns.

This obviously has big implications for performance. The lesson here is that if you’re going to write your own SQL query in the Query Editor, you should make sure it does all of the expensive filters and transformations you need because anything else you do in the query will happen outside the database in Power BI or Excel.

Exporting Power Query/M Queries To ODC Files In Excel 2016 Get & Transform

A really useful new feature was added to Get & Transform (the functionality previously known as Power Query) in the latest updates for the Office 365 click-to-run version of Excel 2016: the ability to export and import ODC files containing M queries. This makes sense given that Get & Transform is the new default way for loading data into Excel, but it’s nonetheless very welcome.

It’s very straightforward to use, and all the details are available in the section on “How do I get data from recently used sources, Office Database Connection (ODC) files, other workbook connections, or tables?” halfway down this article:

https://support.office.com/en-us/article/Unified-Get-Transform-ad78befd-eb1c-4ea7-a55d-79d1d67cf9b3?ui=en-US&rs=en-US&ad=US&fromAR=1

You just need to right-click on your query in the Queries & Connections pane to export it:

image

…and after that you can import the ODC file in the normal way when you want to create a new connection.

As always, I want more though. Some ideas/requests:

  • Power BI Desktop should be able to import and export ODC files in this format too: it would make it much easier to reuse queries. Vote here if you agree.
  • Power BI should have a central repository for Power BI and Excel users to store these ODC files for easy sharing and reuse by report developers. It’s a shame that the Azure Data Catalog integration with Excel/Power BI hasn’t had any love recently because that would have been the obvious place to create such a repository.
  • We also desperately need some kind of source control for M queries inside Excel and Power BI (not quite the same requirement as the previous point). I know a lot of people are doing this manually with services like Git, but I would love to be able to check my code in and out directly from the Query Editor.

I’ve also noticed that the old “Load To” dialog (that I found incredibly confusing) has been replaced by the standard Excel Import Data dialog in this release – another improvement. After you hit Close & Load in the Query Editor, this is what you now see:

image

Data Privacy Settings In Power BI/Power Query, Part 2: Preventing Query Execution

In part 1 of this series I showed how the data privacy settings in Excel Power Query/Get & Transform and Power BI could impact the performance of your queries. In this post I’m going to show you how they can stop a query from running at all.

Let’s say you have the Excel workbook from part 1 of this series, but now instead of using the day name to filter data from a SQL Server table you want to pass that value to a web service. The web service I’m going to use for my examples is one that allows you to search for open data published by the UK government on https://data.gov.uk. It’s very simple: you give it a search term and it returns a JSON document with the search results in, no authentication or anything else required. For example:

https://data.gov.uk/api/3/action/package_search?q=Friday

In fact it doesn’t really matter what it does, just know that it is a web service that I can pass a text parameter to and get a result from.

Here’s a query that reads a single piece of text from the FilterDay table in my Excel workbook:

image_thumb2

…and then passes that value to the web service:

let
    ExcelSource = 
	Excel.Workbook(
		File.Contents("C:\FilterParameter.xlsx")
	, null, true),
    FilterDay_Table = 
	ExcelSource{[Item="FilterDay",Kind="Table"]}[Data],
    ChangedType = 
	Table.TransformColumnTypes(
		FilterDay_Table,
		{{"Parameter", type text}}
		),
    Day = ChangedType{0}[#"Parameter"],
    Output = 
	Web.Contents(
		"https://data.gov.uk/api/3/action/package_search", 
		[Query=[q=Day]]
	),
    ImportedJSON = Json.Document(Output,65001)
in
    ImportedJSON 

This query succeeds if any of the following conditions are true:

  • The data privacy levels of both the Excel workbook and the web service are set to Public
  • The data privacy levels of both the Excel workbook and the web service are set to Organizational
  • The data privacy level of the Excel workbook is set to None and the data privacy level of the web service is set to Public

[See here to find out how to set privacy levels for a data source. Interestingly the data privacy level of the web service cannot be set to None for this query – the UI always prompts for it to be set before the query will run]

Here’s the output of a successful run:

image

Any other combinations of data privacy settings, for example if both the Excel workbook and the web service are set to Private, result in the following error message:

Formula.Firewall: Query ‘WebFunctionSucceeds’ (step ‘ImportedJSON’) is accessing data sources that have privacy levels which cannot be used together. Please rebuild this data combination.

image

While it’s ok to send data from one Public data source to another Public data source, or from one Organizational data source to another Organizational data source, you cannot send data from one Private data source to any other data source, or even send data from a Public data source to a Private data source.

In the example in my previous post when the engine found it wasn’t allowed to send data from one source to another because of the data privacy rules used it was still able to run the query, but had to do so in a less efficient way. In this example there is no way to run this query without sending data from the Excel workbook to the web service – you can’t call this web service without sending a search term to it. As a result, if incompatible data privacy levels are set then the query returns the error shown.

Notice that in the query above I’m reading data from Excel and sending it to the web service in a single M query. This is deliberate! In the next post in this series I’ll be looking at examples where the engine can’t work out what it’s supposed to do and errors, even if the privacy levels used suggest the query should run.

Data Privacy Settings In Power BI/Power Query, Part 1: Performance Implications

One of the most confusing, under-documented and widely-misunderstood features of Power BI and Power Query (or Excel “Get & Transform” or whatever you want to call it) are the data privacy settings. I get caught out by them all the time, so I thought it would be a good idea to write a series of blog posts demonstrating how they work and what effect they have in different scenarios using example M queries.

Before carrying on, I suggest you read the official Microsoft documentation on the subject here:
https://powerbi.microsoft.com/en-us/documentation/powerbi-desktop-privacy-levels/
It gives you a good grounding in what the different data privacy levels are and where you can set them in the Power BI UI. The same options are available in Excel with Power Query/Get & Transform.

In this first post I’m going to look at what the performance implications of different privacy levels can be. Let’s say you have two data sources. First, an Excel workbook with a single table in that contains the name of a day of the week:

image

The second is the DimDate table in the Adventure Works DW SQL Server sample database:

image

Here’s an M query called FilterDay that returns the day name from the table in the Excel workbook:

let
    Source = Excel.Workbook(
		File.Contents("C:\Filter.xlsx")
		, null, true),
    FilterDay_Table = Source{[Item="FilterDay",Kind="Table"]}[Data],
    ChangedType = 
	Table.TransformColumnTypes(
		FilterDay_Table,
		{{"Parameter", type text}}),
    Output = ChangedType{0}[#"Parameter"]
in
    Output

image

Here’s an M query called DimDate that filters data from the DimDate table in the Adventure Works DW database, returning only the rows where the EnglishDayNameOfWeek column matches the value returned by the FilterDay query above.

let
    Source = Sql.Databases("localhost"),
    DB = Source{[Name="Adventure Works DW"]}[Data],
    dbo_DimDate = DB{[Schema="dbo",Item="DimDate"]}[Data],
    RemovedColumns = Table.SelectColumns(dbo_DimDate,
		{"DateKey", "EnglishDayNameOfWeek"}),
    FilteredRows = Table.SelectRows(RemovedColumns, 
		each ([EnglishDayNameOfWeek] = FilterDay))
in
    FilteredRows

The first time you run this second query you’ll be prompted to enter credentials to connect to SQL Server, and then (assuming you haven’t set any of the options that get Power BI to ignore privacy levels) you’ll see the “Information is required about data privacy” prompt:

image

Clicking “Continue” allows you to set a privacy level for the SQL Server database and the root of the drive that the Excel workbook is on:

image

The dropdown boxes in the left-hand column allow you to be more specific in what you set privacy levels on; in the case of the Excel file you can choose to set permissions at all folders in the path down to the workbook and the workbook itself; in the case of SQL Server there is the option of setting privacy levels on the database rather than the server.

You can choose any of the three privacy levels (Public, Organizational or Private) for the SQL Server instance and the Excel file and the query will still run:

image

This is where things start to get confusing. At this point the workbook has a privacy level set to None, which is the default for newly-created data sources:

image

However, this “None” setting means it inherits its data privacy settings from the level set for the c:\ drive set earlier, even though this setting isn’t displayed on the “Data sources in current file” tab. Hopefully the fact this is happening will be addressed in a future release: I think the current UI is misleading.

For the purposes of this blog post the data privacy settings for SQL Server are irrelevant and it’s the settings on the Excel workbook that are important. So, to make things clearer, let’s say you explicitly set the data privacy setting for SQL Server to private and the data privacy setting for the Excel workbook itself to Public like so:

image

If you right-click on the FilteredRows step of the query that returns data from the DimDate table in SQL Server and select “View Native Query” (see here for more details on this feature) to see the SQL query generated in the background for this step like so:

image

You’ll see that query folding is taking place for this step and the filter on the EnglishDayNameOfWeek column is taking place in the SQL query:

image

select [_].[DateKey],
    [_].[EnglishDayNameOfWeek]
from 
(
    select [DateKey],
        [EnglishDayNameOfWeek]
    from [dbo].[DimDate] as [$Table]
) as [_]
where [_].[EnglishDayNameOfWeek] = 'Friday'

Query folding is almost always a good thing for the performance of a query. For more details on what query folding is, see here.

If, however, you set the privacy level for the Excel workbook to Private like so:

image

…even though the DimDate query still works, query folding does not take place for the Filtered Rows step. The View Native Query right-click option is greyed out, and Profiler shows that the following SQL is executed when the query is refreshed:

select [$Ordered].[DateKey],
    [$Ordered].[EnglishDayNameOfWeek]
from 
(
    select [DateKey],
        [EnglishDayNameOfWeek]
    from [dbo].[DimDate] as [$Table]
) as [$Ordered]
order by [$Ordered].[DateKey]

Note that there is no WHERE clause in this query and that the whole of the DimDate table is returned from SQL Server.

Data from a data source that has a privacy level of Private can never be sent to another data source. That is exactly what needs to happen for query folding to take place though: a value from the Excel workbook – the text “Friday” – needs to be embedded in the WHERE clause of the SQL query sent to SQL Server in order for filtering to happen inside the database.  The risk with query folding is that a DBA could monitor the queries that are being run on SQL Server, look at the WHERE clauses, and see data from your Excel workbook. That’s maybe not a problem with day names, but potentially an issue if you were working with more sensitive data like customer names or addresses. Therefore, with the Excel workbook’s privacy level set to Private, the whole of the DimDate table is downloaded into the M engine and the filtering has to take place there to maintain the privacy of the data in Excel. The query still runs but it will probably be a lot slower than it would have been had query folding taken place. With the privacy level of the Excel workbook set to Public, on the other hand, it is ok to send data from Excel to SQL Server so query folding does take place.

To sum up, in this post I have shown how different data privacy settings can affect the performance of a query by determining whether query folding takes place or not. In part 2 of this series I will show how different data privacy settings can determine whether a query executes at all.

[This post was updated on June 19th 2017 with details on how the “None” privacy level behaves and how permissions can be set at different levels and inherited – thanks to Curt Hagenlocher for providing these details]