Avoiding Duplication Of Database Connection Information In Power BI

In a year’s time there will be a great opportunity for someone to present a session on “Power BI Development Best Practices” at the PASS Summit. Before then, we will all have to work out what those best practices actually are – probably the hard way. With that in mind, here’s a suggestion for one good practice that came out of a conversation at this year’s PASS Summit (thank you Prathy).

If you’re connecting to a data source like SQL Server in Power BI (or Power Query for that matter) you’re probably going to be importing multiple tables of data. For example, if I was importing data from my local SQL Server instance and the Adventure Works DW database, I might see something like this in the Navigator pane in Power BI:


Clicking the Load or Edit buttons would create five different queries to get data, one from each of the selected tables:


The problem here is that each query duplicates the connection information for the SQL Server database; for example the M code for the FactInternetSales query looks like this:

    Source = 
        Sql.Database("chriszbook", "adventure works dw"),
    dbo_FactInternetSales = 

That means that if you ever need to change the server or database that the queries point to – maybe because the server has migrated, or because you’re moving the reports from dev to test to production – then you have to edit each of these five queries. Which would be a pain.

Ideally Power BI would create a single connection that each of these queries could share, something like a data source in SSRS. In fact I can see that this has already been raised as a suggestion on the forum here and is under review; I’m sure some more votes would raise its profile. However there are certainly a number of different ways you can avoid this kind of duplication by making your own changes to these queries though.

One possible approach would be to create new Power BI queries that returned the names of the SQL Server instance and the database name, and for each of your main queries to reference these queries. To do this you would need to:

1) Create a new query using the Blank Query option:


2) Call the query SQLServerInstanceName:


3) Open the Advanced Editor window by clicking on the Advanced Editor button on the Home tab, deleting all the automatically generated code in there and replacing it with the name of the SQL Server instance that you want to connect to in double quotes:


4) Repeat steps 1-3 to create a new query called SQLServerDatabaseName that returns the name of the database that you want to connect to. It might also be a good idea to create a new group to separate these new queries from the ones that load the data:


5) Edit the queries that actually return data so that instead of hard-coding the instance name and database name, they take these values from the queries you have just created. A Power BI query can return a value of any data type (not just a table), and the queries created in steps 1-4 return values of type text – the names of the SQL Server instance and database to connect to. These queries can now be used as variables in other queries, so after editing the FactInternetSales query shown above would look like this:

    Source = 
        Sql.Database(SQLServerInstanceName, SQLServerDatabaseName),
    dbo_FactInternetSales = 


Now, if you ever need to change the connection you just need to change the values in these two queries rather than edit every single query that returns data.

There are other ways of solving this problem: for example you could have a query that returns the output of Sql.Database() (as used in the Source step in the FactInternetSales query shown above)and have all the other data load queries reference that. I like the approach I show here though because it makes it very easy to see the values for the SQL Server instance and database that are currently in use. If you’re being even more ambitious – maybe because you have many queries in many .pbix files that connect to the same database – you could even store connection information somewhere outside the .pbix file, maybe in another SQL Server database. But if you did that, you would then need to worry about the connection information for that database too…

Loading English LiDAR Open Data Into Excel And Power Map Using Power Query

…or, if you want the clickbait title of this post, “Creating cool Minecraft-type landscapes in Power Map”…

I’m a big fan of Owen Boswarva’s UK open data blog, and yesterday this post on English LiDAR data released by the Environment Agency caught my eye:


I won’t repeat what he says because there’s nothing I could add to it, but basically this data is a 3D scan of large parts of England at extremely detailed resolutions. You can download the data from this site:


My first thought after reading this was post was that it should be possible to load this data into Excel (in general) and Power Map (in particular) using Power Query fairly easily. And indeed it was – so in this blog post I’ll show you how.

First of all, here’s a sample of the output. This is a Power Map custom map showing tile SU9090, which shows a section of the M40 motorway and its surroundings near High Wycombe:


Zooming in on an area just below the centre of the above screenshot you can see several distinctively-shaped office buildings, the M40 going across the screen and, on the top right-hand side, a Tesco superstore:


Here’s the same area shown in Bing Maps (http://binged.it/1RkakM7):


Here’s the obligatory video:

How did I do this? The first step is to load the data into the Excel Data Model using Power Query. I created a Power Query function called LoadLiDARData to do this, and you can get a sample workbook with it in from here. There isn’t much to say about the function – it’s not trivial, but not very interesting in itself. To use it, in Excel go to the Power Query tab on the ribbon and open the Workbook Queries pane. To load data from a single .asc file downloaded from the Environment Agency site, right-click to invoke the function and enter the filename (including the full path) of that file. The Power Query Editor window will open but you don’t need to do anything here except click the Close & Load button. I recommend you load the output to the Excel Data Model rather than a table in the worksheet because a single file contains a lot of data!

Next, go to the Insert tab on the Excel ribbon and click the Map button to open Power Map. Then click New Scene\New Custom Map to create a new custom map. You need to make sure that the Min values for the X and Y axis are set to 0, the Max value for X is set to the number of columns and the Max value for Y is set to the number of rows in the data you’re working with. Also, check the Flip Axis button for the Y axis.


Then, on the right-hand side of the screen check the ColumnNumber and RowNumber boxes and set them to be the X Coordinate and Y Coordinate respectively:


Click the Next button, then check Height to use as the height of a bar graph:


Finally – and this is important – go to Settings and adjust the Thickness property appropriately. If you don’t do this you won’t be able to pick out any detail; I have used 18% as a good starting setting. Ticking the Lock current scale box is also a good idea, so that when you zoom in the bars don’t disappear.


This is no more than a proof-of-concept. I’ve only tested my code with a very limited amount of data, and there are a lot of obvious improvements that could be made: for a start, it would be good to convert the grid co-ordinates used to latitude and longitude so that this data can be used with the maps in Power Map. I could also imagine some really useful applications for this, such as combining a map like this with snowfall data and creating an animation showing where snow falls over a landscape during the winter.

Of course a dedicated tool will do a much better job of visualising this data than Excel, but if you have the right edition of Excel 2013 then you can get everything you need to do this for free and in Excel 2016 Power Query and Power Map (as Get & Transform and 3D Maps) are available out-of-the-box. It also sounds like Power Map will be integrated with Power BI at some point too:


Referencing Individual Cell Values From Tables In Power Query

[This blog post is relevant to Power Query in Excel 2010/2013, the Get & Transform section on the Data tab in Excel 2016, and the Get Data screen in Power BI Desktop. I’m going to use the term ‘Power Query’ in this post to refer to all of the previously mentioned functionality]

Sometimes, when you’re working with a table of data in Power Query, you want to be able to get the value from just one cell in that table. In this blog post I’ll show you how you can do this both in the UI and in M code, and talk through all of the more advanced options available in M. Incidentally this is a topic I covered in some detail in the M chapter of my Power Query book, but since that book is now somewhat out-of-date I thought it was worth covering again in a blog post.

Referencing Cell Values In The UI

Imagine your data source is an Excel table that looks like this:


If you import it into Power Query, and you want to get the value in the cell on the second row in ColumnB, then in the Query Editor window you just need to right-click on that cell and select Drill Down:


…and bingo, you have the value 5 returned:


Note that this is the value 5, and not the value 5 in a cell in a table – a Power Query query can return a value of any data type, and in this case it’s going to return a single integer value rather than a value of type table. If you load the output of this query into Excel you’ll still see it formatted as a table, but if you’re using the output of this query as an input for another query (for example, you might want to read a value from Excel and use that value as a filter in a SQL query) it’s much more convenient to have an integer value than a table with one column and one row.

Referencing Cell Values in M

You can see in the screenshot above the M code generated for the Drill Down by the UI, and probably guess how it works. Here’s a cleaned-up version of the query from the previous section for reference:

    Source = Excel.CurrentWorkbook(){[Name="SourceData"]}[Content],
    ChangeDataTypes = Table.TransformColumnTypes(
                                        {{"ColumnA", Int64.Type}, 
                                         {"ColumnB", Int64.Type}, 
                                         {"ColumnC", Int64.Type}}),
    GetMiddleCell = ChangeDataTypes{1}[ColumnB]

There are three steps here:

  • Source reads the data from the Excel table
  • ChangeDataTypes sets the data types for the three columns in the table to be Whole Number
  • GetMiddleCell returns the value from the middle cell of the table returned by the ChangeDataTypes step

M allows you to refer to individual values in tables by a system of co-ordinates using the name of the column and the zero-based row number (tables in Power Query are always assumed to be sorted, therefore it making it possible to ask for a value from the nth row). So the expression

returns the value from the cell on the second row in the column called ColumnB of the table returned by ChangeDataTypes, which is 5. Similarly, the expression

returns the value 3, which is the value in the column ColumnC on the first row.

It’s also worth pointing out that the row and column reference can be in any order, so the expression


…returns the same value as


As you’ll see in a moment, the order that the row and column reference come in could be important.

Referring To Rows Or Columns That Don’t Exist

What happens if you write an expression that refers to a row and/or column that doesn’t exist? You get an error of course! So using our example query, the expressions




…will both return errors because there isn’t a fifth row in the table, and there isn’t a column called ColumnD.

However, instead of an error you can return a null value by using the ? operator after the reference. For example, the expression


returns the value null instead of an error:


You have to be careful though! The expression


still returns an error, not because there isn’t a fifth row but because the reference to the fifth row returns a null value and there is no column called ColumnB in this null value.


The solution here is to reverse the order of the references, like so:


or even better use ? with both references:



Unfortunately using ? won’t stop you getting an error if you use a negative value in a row reference.

The Effect Of Primary Keys

Did you know that a table in Power Query can have a primary key (ie a column or columns whose values uniquely identify each row) defined on it? No? I’m not surprised: it’s not at all obvious from the UI. However there are several scenarios where Power Query will define a primary key on a table, including:

  • When you import data from a table in a relational database like SQL Server, and that table has a primary key on it
  • When you use the Remove Duplicates button to remove all duplicate values from a column or columns, which behind the scenes uses the Table.Distinct() M function
  • When you use the Table.AddKey() M function on a table

The presence of a primary key affects how the Drill Down functionality works, and gives you another way of referencing individual cells.

Consider the following Excel table, basically the same as the table you’ve seen already but with a new column that uniquely identifies each row:


If you load the table into Power Query and then right-click on the column MyKeyColumn and select Remove Duplicates, you will have set this key as a primary key:


(By the way, you can use the Table.Keys() function to see what keys are defined on a table in Power Query, and there’s an example query showing how to use this in the sample workbook for this post).

Having done this, if you use the Drill Down functionality to get the value from the second row in ColumnB once again, this time you’ll get a query something like this:


    Source = Excel.CurrentWorkbook(){[Name="SourceData3"]}[Content],
    ChangedDataTypes = Table.TransformColumnTypes(
                                      {{"MyKeyColumn", type text}, 
                                      {"ColumnA", Int64.Type}, 
                                      {"ColumnB", Int64.Type}, 
                                      {"ColumnC", Int64.Type}}),
    RemovedDuplicates = Table.Distinct(
    SecondRow = RemovedDuplicates{[MyKeyColumn="SecondRow"]}[ColumnB]

The last step is the important one to look at. The row in the reference is no longer by the row number but by the value from the primary key column instead:



You can still use the previous row number-based notation, but when a table has a primary key column defined on it you can also use a value from the primary key column to identify a row.

A Last Warning About Performance

Being able to reference individual values like this is incredibly useful for certain types of query and calculation. However, bear in mind that there are often many different ways of solving the same problem and not all of them will perform as well as each other. One obvious use of the techniques I’ve shown in this post would be to write a previous period growth calculation, where you need to refer to a value in a previous row in a table – but my experience is that writing calculation using row and column references prevents query folding and leads to poor performance, and an alternative approach (maybe like the ones shown here and here involving joins) often performs much better. There aren’t any general rules I can give you though, you just need to make sure you test thoroughly.

You can download the sample workbook for this post here.

Loading Twitter Archive Data In Power Query

If you’re a Twitter user (I’m @Technitrain if you aren’t following me already) you may be aware that you can download your entire Twitter history – all of the data from all of your tweets – as a series of .js files. All the details on how to do this are here:


Since Power Query can work with JSON data I thought it would be easy to use it to analyse my Twitter archive… but while it’s possible, it’s not entirely straightforward. The problem seems to be that Power Query doesn’t like the first line in each .js file that Twitter gives you. Removing that first line isn’t too difficult but it requires some M code, so here’s a function (I’ve called it GetTwitterArchiveFile) that handles that problem. You give it the binary data from the file and it returns a table containing all the data from that file:

(TwitterFile as binary)=>
    //Read data from file and interpret as Lines
    Source = Lines.FromBinary(TwitterFile),
    //Remove the first line
    RemoveFirstRow = List.Skip(Source,1),
    //Convert back to Binary
    ConvertBackToBinary = Lines.ToBinary(RemoveFirstRow),
    //Now convert to JSON
    ConvertToJSON = Json.Document(ConvertBackToBinary),
    //Flatten to a table
    ConvertToTable = Table.FromList(ConvertToJSON, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
    //Expand the first set of columns
    ExpandColumns = Table.ExpandRecordColumn(ConvertToTable, "Column1", {"source", "entities", "geo", "id_str", "text", "id", "created_at", "user", "retweeted_status", "in_reply_to_status_id_str", "in_reply_to_user_id", "in_reply_to_status_id", "in_reply_to_screen_name", "in_reply_to_user_id_str"}, {"source", "entities", "geo", "id_str", "text", "id", "created_at", "user", "retweeted_status", "in_reply_to_status_id_str", "in_reply_to_user_id", "in_reply_to_status_id", "in_reply_to_screen_name", "in_reply_to_user_id_str"})

Here’s an example of how to use the above function: it’s another function (called GetTwitterFullArchive) which, when you pass it the path of your tweets folder (this will be wherever you unzipped the download that you get from Twitter) returns the combined contents of all of the .js files in that format by calling GetTwitterArchiveFile() for each one:

(TweetsFolderPath as text) =>
    //Connect to Tweets folder
    Source = Folder.Files(TweetsFolderPath),
    //Remove everything but Content column
    #"Removed Other Columns" = Table.SelectColumns(Source,{"Content"}),
    //Use Custom Column to call GetTwitterArchiveFile for each .js file in the folder
    #"Added Custom" = Table.AddColumn(#"Removed Other Columns", "Custom", each GetTwitterArchiveFile([Content])),
    //Remove the Content columns
    #"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"Content"}),
    //Expand all columns in the Custom column
    #"Expanded Custom" = Table.ExpandTableColumn(#"Removed Columns", "Custom", {"source", "entities", "geo", "id_str", "text", "id", "created_at", "user", "retweeted_status", "in_reply_to_status_id_str", "in_reply_to_user_id", "in_reply_to_status_id", "in_reply_to_screen_name", "in_reply_to_user_id_str"}, {"source", "entities", "geo", "id_str", "text", "id", "created_at", "user", "retweeted_status", "in_reply_to_status_id_str", "in_reply_to_user_id", "in_reply_to_status_id", "in_reply_to_screen_name", "in_reply_to_user_id_str"})
    #"Expanded Custom"

Invoking this function in a query, for example like this:

    Source = GetFullTwitterArchive("C:\Users\Chris\Downloads\TwitterArchive\data\js\tweets")

Gives you the following output:


As you can see, there are plenty of other columns that can themselves be expanded, but this is a good starting point for any analysis.

There’s nothing really ground-breaking in what I’ve done here – it’s a fairly standard example of how you can use Power Query functions to combine data from multiple files, very similar to this example of combining data from multiple Excel files.

There’s absolutely loads of interesting data that you can play with here, but to start with here’s a query that finds the top 10 people I have replied to on Twitter:

    Source = GetFullTwitterArchive("C:\Users\Chris\Downloads\TwitterArchive\data\js\tweets"),
    #"Removed Other Columns" = Table.SelectColumns(Source,{"in_reply_to_screen_name"}),
    #"Grouped Rows" = Table.Group(#"Removed Other Columns", {"in_reply_to_screen_name"}, {{"Count", each Table.RowCount(_), type number}}),
    #"Filtered Rows" = Table.SelectRows(#"Grouped Rows", each ([in_reply_to_screen_name] <> null)),
    #"Sorted Rows" = Table.Sort(#"Filtered Rows",{{"Count", Order.Descending}}),
    #"Kept First Rows" = Table.FirstN(#"Sorted Rows",10)
    #"Kept First Rows"

Here’s the output as a table and chart:



Looks like Jamie Thomson is the winner by a wide margin!

Here’s an example of a  NodeXL graph I built from this data, showing the relationships between users who I have mentioned together in a single tweet:


You can download the sample workbook for this post, containing all the functions (but not my data) here.

Exporting All M Code From Power Query In Excel 2013

Here’s a tip that I found out about on the Power Query Technet forum that I thought was worth repeating. If you ever need a quick way of exporting all the M code for all of the queries in an Excel 2013 workbook, just open the Power Query window and click the Send a Frown button shown here:


Then, when the Send Feedback dialog appears, make sure Include Formulas is ticked then click OK:


When you do that, you’ll get an email created for you that contains a whole lot of debugging information, plus all of the M code for your queries:


Obviously, don’t send this email to Microsoft!

It’s quite easy to see the individual queries. You then need to go to your new workbook, create a new query by selecting the Blank Query option under Other Sources, and then open the Advanced Editor window and paste the code for each query in. However, when you do that you will need to modify the code a bit. There are three pieces of code you will need to delete:

  • At the beginning of the M code, where it says
    section Section1;
  • At the beginning of each query, a piece of code that contains the original name of the query:
    shared MyOriginalQueryName =
  • At the very end, a semi-colon

Of course in Excel 2016 we’ll have much better options for copying, pasting and moving queries in VBA, but until then we’ll have to live with hacks like this.

Running Your Own MDX And DAX Queries In Power BI Desktop

Every time there’s a new release of Power Query or Power BI Desktop, I always check to see if there are any interesting new M functions that have been added (I used #shared to do this, as detailed here). For the RTM version of Power BI Desktop I spotted two new functions:


As well as ODBC connections, we can now use OLEDB and ADO.NET data sources – although they aren’t shown in the UI yet. And you know what this means… with an OLEDB connection we can now run our own MDX and DAX queries against SSAS data sources! I assume this will be coming in Power Query in Excel soon too.

Here’s an example query showing how to use OleDB.Query() to run an MDX query against the Adventure Works DW cube in SSAS Multidimesional:

    Source = OleDb.Query(
              "Provider=MSOLAP.5;Data Source=localhost;
               Initial Catalog=Adventure Works DW 2008", 
              "select {measures.[internet sales amount]} on 0, 
               [date].[calendar].[calendar year].members on 1 
               from [adventure works]"

As you can see, it’s pretty straightforward: you just need to supply a connection string and a query. You will need to tell Power BI Desktop which credentials to use when running the query the first time you connect to SSAS, and that’s probably going to be Windows:


You will also see a prompt the first time you run the query, asking for permission to run a Native Database Query:


This prompt will appear each time a different MDX query is run; you can turn off this prompt in the Options dialog on the Security tab by unchecking the Require user approval for new native database queries box:


Here’s the output of the MDX query from the example code:


Checking Columns Are Present In Power Query

Something I was meaning to mention in my previous post (but forgot about…) was that in a lot of cases you don’t really care if your output contains all the required columns – it’s enough just to check that your input contains all the required columns. Luckily M has a function called Table.HasColumns() to help you do this. For example, using the csv source file from my previous post, which should have three columns called Product, Month and Sales, the following query will return true if the source file has these columns and false if it doesn’t:

    Source = Csv.Document(File.Contents("C:\MissingColumnDemo.csv"),[Delimiter=",",Encoding=1252]),
    PromotedHeaders = Table.PromoteHeaders(Source),
    CheckColumns = Table.HasColumns(PromotedHeaders, {"Product", "Month", "Sales"})