Avoiding Duplication Of Database Connection Information In Power BI

In a year’s time there will be a great opportunity for someone to present a session on “Power BI Development Best Practices” at the PASS Summit. Before then, we will all have to work out what those best practices actually are – probably the hard way. With that in mind, here’s a suggestion for one good practice that came out of a conversation at this year’s PASS Summit (thank you Prathy).

If you’re connecting to a data source like SQL Server in Power BI (or Power Query for that matter) you’re probably going to be importing multiple tables of data. For example, if I was importing data from my local SQL Server instance and the Adventure Works DW database, I might see something like this in the Navigator pane in Power BI:

image

Clicking the Load or Edit buttons would create five different queries to get data, one from each of the selected tables:

image

The problem here is that each query duplicates the connection information for the SQL Server database; for example the M code for the FactInternetSales query looks like this:

[sourcecode language=”text” padlinenumbers=”true”]
let
Source =
Sql.Database("chriszbook", "adventure works dw"),
dbo_FactInternetSales =
Source{[Schema="dbo",Item="FactInternetSales"]}[Data]
in
dbo_FactInternetSales
[/sourcecode]

That means that if you ever need to change the server or database that the queries point to – maybe because the server has migrated, or because you’re moving the reports from dev to test to production – then you have to edit each of these five queries. Which would be a pain.

Ideally Power BI would create a single connection that each of these queries could share, something like a data source in SSRS. In fact I can see that this has already been raised as a suggestion on the forum here and is under review; I’m sure some more votes would raise its profile. However there are certainly a number of different ways you can avoid this kind of duplication by making your own changes to these queries though.

One possible approach would be to create new Power BI queries that returned the names of the SQL Server instance and the database name, and for each of your main queries to reference these queries. To do this you would need to:

1) Create a new query using the Blank Query option:

image

2) Call the query SQLServerInstanceName:

image

3) Open the Advanced Editor window by clicking on the Advanced Editor button on the Home tab, deleting all the automatically generated code in there and replacing it with the name of the SQL Server instance that you want to connect to in double quotes:

image

4) Repeat steps 1-3 to create a new query called SQLServerDatabaseName that returns the name of the database that you want to connect to. It might also be a good idea to create a new group to separate these new queries from the ones that load the data:

image

5) Edit the queries that actually return data so that instead of hard-coding the instance name and database name, they take these values from the queries you have just created. A Power BI query can return a value of any data type (not just a table), and the queries created in steps 1-4 return values of type text – the names of the SQL Server instance and database to connect to. These queries can now be used as variables in other queries, so after editing the FactInternetSales query shown above would look like this:

[sourcecode language=”text” highlight=”3″]
let
Source =
Sql.Database(SQLServerInstanceName, SQLServerDatabaseName),
dbo_FactInternetSales =
Source{[Schema="dbo",Item="FactInternetSales"]}[Data]
in
dbo_FactInternetSales
[/sourcecode]

image

Now, if you ever need to change the connection you just need to change the values in these two queries rather than edit every single query that returns data.

There are other ways of solving this problem: for example you could have a query that returns the output of Sql.Database() (as used in the Source step in the FactInternetSales query shown above)and have all the other data load queries reference that. I like the approach I show here though because it makes it very easy to see the values for the SQL Server instance and database that are currently in use. If you’re being even more ambitious – maybe because you have many queries in many .pbix files that connect to the same database – you could even store connection information somewhere outside the .pbix file, maybe in another SQL Server database. But if you did that, you would then need to worry about the connection information for that database too…

32 thoughts on “Avoiding Duplication Of Database Connection Information In Power BI

  1. The problem is maybe bigger when referencing for example other excel files in PowerQuery …
    A database will not be so often changed, although changing connections from a lIve to a test db could be interesting. But the real issue is the links to other files , like Excel files that contain data needed in other excel files.
    When you import data from another Excel file with PowerQuery, then the full path is stored in the PowerQuery code. This is dangerous because how often do we not move Excel files from one location to another, which will break the powerquery. Or also depending on which environment you are working now (live/test) , you may need to connect to another excel file for getting different data originating from these environments. Managing this in the best possible way is a real challenge!

    We are using a combination of different components to manage this :
    * a ‘Parameter’ sheet in the Excel which contains an indication of the ‘environment’ : this is manually maintained in the excel
    * a PowerQuery ‘Parameter’ which takes parameters from a parameter table in a SQL db (A SQL db connection will not often change) : this PowerQuery contains a filter based on the ‘environment’: so it takes the parameters for the environment defined in the ‘Parameter’ sheet.

    * These parameters contain paths which will be used to link to the excel files which contain data which will be used in the powerqueries. These powerqueries look then like :

    Source = Excel.Workbook(File.Contents(Function_GetParameter(“PowerUpdate_Path_Basis”)&”…Source Excel file name …xlsx”), null, true),
    …….

    Function Getparameter looks like :
    let
    Function_GetParameter = (ParameterName as text) =>

    let
    Parameter.Source = Excel.CurrentWorkbook(){[Name=”….PQ name which retrieves the parameters……”]}[Content],
    Parameter.Data = Table.SelectRows(Parameter.Source, each ([ParameterName] = ParameterName)),
    Parameter.Value =
    if Table.IsEmpty(Parameter.Data) = true then
    null
    else
    Record.Field(Parameter.Data{0},”ParameterValue”)
    in
    Parameter.Value
    in
    Function_GetParameter

    Tx for giving you feedback on this how you manage this and how we could improve this further.

    1. Hi Kurt, there’s nothing at all to improve about this. Personally I prefer to use separate named ranges (instead of a single table) in Excel to hold my parameters and then individual queries to get the values from them, because I think it’s simpler than using a table but that’s a matter of personal taste.

      1. Chris
        tx for the feedback…one more question a bit in line with this..but not really fitting maybe in this post…so sorry for messing up …feel free to answer offline or maybe via another post …

        I’m looking already quite some time for info about how powerquery treats the order of multiple powerqueries in 1 excel sheet.
        So if we have multiple powerqueries…how does determine powerquery which one to run first … This could be important when you have for example a powerquery that contains connection info which is then used in the connection string in another PowerQuery… we need to be sure that the first PowerQuery is refreshed, and certainly avoid that both powerqueries are running at the same time, so we need to be sure that the first PowerQuery is finished before the other starts.

        Is there documentation available about the execution order, … ?

        tx for the feedback
        Kurt

      2. Ok Chris, but can we also be sure that it will go fine when, like the example above , we have a PowerQuery to retrieve parameters, which are then available in an excel table and which are called with these kind of commands :

        Parameter.Source = Excel.CurrentWorkbook(){[Name=”….PQ name which retrieves the parameters……”]}[Content],

        I’m not sure whether PQ will detect dependancy on this ? …Am I correct to say that with this command you refer to a excel table and not immediately to a PowerQuery. Will PQ be so smart to detect also here the relation ?

        I seem to have issues with that causing some excels to fail from time to time. To be complete…I’m using power update to schedule the refreshes, so maybe it is more a power update problem ? Anyway sometimes the excel fails , sometimes not and I do not get real info on the exact cause…

      3. To be clear: you are using Power Query to load parameter values from an external data source into an Excel table, then using a second Power Query query to load the values from the Excel table? If so, no I don’t think Power Query would be able to detect that kind of dependency.

      4. I think that this is indeed the way it works in this setup … we use this kind of command to retrieve the parameters from the list:
        Excel.CurrentWorkbook(){[Name=”….PQ name which retrieves the parameters……”]}[Content],

        I assume that this considers the paramterlist value as a Excel table and not as a PowerQuery resultset?

        some kind of function like this could maybe do the trick ???
        let GetParameterFromPQList = (ParmName) =>

        let
        SourceList = …name of PQ that returns the parameters…. ,
        #”Filtered Rows” = Table.SelectRows(SourceList, each [ID] = ParmName) ,
        Parm_Text = #”Filtered Rows” {0}[ID]
        in
        Parm_Text
        in #”Invoked FunctionGetParameterFromPQList”

      5. Yes, if you are using Excel.CurrentWorkbook() to get data from an Excel table then Power Query will treat it as an Excel table – it will not know that Power Query has been used to populate the table. However, why would you ever need to refresh the query that populates the parameter table? Presumably you are loading default values into Excel and then users can change them afterwards. If you refreshed that first query, wouldn’t users’ changes be overwritten?

      6. we use this to define the base location of some excel files that hold ‘master data’ powerqueries
        When we would move these excel master files to another location, we just need to change this parm in 1 location and all other excel files that read from these master excel files would keep on working (because they will get the new path via this parameter) without manual changes needed in every single excel file.
        Feel free to suggest other ways to manage this.

  2. Great tip – save me from changing server name 21 times in my pbix, so thanks! One last step to add: Hide the SQLServerDatabaseName and SQLServerInstanceName tables from the report view.

  3. This is a great feature and it would be even better if you could still update your data once publishing your reports to the Power BI site, but for some reason I can’t get it to update when using parameters.

    1. Are you talking about scheduled refresh? This should work, but only if you have followed my instructions *exactly*. I heard last week that if you use the formula bar to edit the parameter queries rather than the Advanced Editor then refresh doesn’t work; the M code for these queries must not contain a let expression.

      1. Thank you Chris! You spottet my error. Apparently when you edit the value using the formula bar “let” is added.

        Now I just get another error, but that is related to a totally different subject. Get this “[Expression error] The name ‘Table.TransformColumnNames’ wasn’t recognized” which I don’t really understand. I think it’s from the code I generated to give me a date tabel.

  4. Why not just have a query that connects to the database :
    let Source = Sql.Database(“chriszbook”, “adventure works dw”)
    in Source
    … and reference it in subsequent queries?
    Did I miss something?

    1. That’s a perfectly good approach too – but when you’re working with SQL Server, you’ll typically start by importing a lot of tables using the UI and it’s slightly easier to modify these queries using the approach described here than to create a source query and reference it.

  5. Great post. I am using Mysql as source. I have tried to use your method by I need to replace dw_nl with “SQLServerDatabaseName” but I cant get the syntax right. Any tips for solving this?

    let
    Source = MySQL.Database(SQLServerInstanceName, SQLServerDatabaseName, [ReturnSingleDatabase=true]),
    dw_nl_leasing_offers = Source{[Schema=SQLServerDatabaseName,Item=”leasing_offers”]}[Data]
    in
    dw_nl_leasing_offers

      1. Yes. 🙁 I am trying to use your method so I dont have to edit all the connections manual when I change from one MYsql DB to an other DB with other schemaname. (same server).
        So I need to do something like:
        let
        Source = MySQL.Database(SQLServerInstanceName, SQLServerDatabaseName, [ReturnSingleDatabase=true]),
        SQLServerDatabaseName_leasing_offers = Source{[Schema=SQLServerDatabaseName,Item=”leasing_offers”]}[Data]
        in
        SQLServerDatabaseName_leasing_offers.

        Do you know any method to have a MYsql datasource as variables?

  6. Hi Chris,
    Is it possible to change the dat source from Excel document to a direct query in Power BI desktop?
    I can edit the Source and change to different Excel files, but can’t change from Excel to a direct query. The reason being I have reports develop based on Excel dump of a database and now need to connect to the database to test the reports as direct query.
    Hope you can help. Thanks in advance.

  7. Hi,

    Sorry if I’am out of subject or if my english is approximate but I have a question about multiple database in one server. I want to work on different table from 2 different database in a unique PostGreSql server and it seems that PQ can save just one autorisation information (login et Pwd). Is there a way to handle with this ?

    Many thanks !

  8. In my experience, you can use the Power BI GUI to change the connection string for all queries that use the same connection at the same time. In the ribbon, select Edit Queries > Data source settings. Select the data source you want to change, click the ‘Change Source…’ button, then modify the connection string in the new window. Hit OK and all queries that used that connection string will be updated with the new server/database. In this case I’m referring to a SQL server connection, but it’s worked well for me with Sharepoint, Excel, etc.

    1. Yes, that works too, although I didn’t know about that when I wrote this post. Using parameters for this still has one more advantage though: you can change parameters manually or through the API after the dataset has been published.

Leave a Reply to SebastienCancel reply