Create Your Own Relationships Between Tables In The Excel Data Model With Power Query

You probably know that, when you are importing data from multiple tables in SQL Server into the Excel Data Model in Excel 2013 using Power Query, Power Query will automatically create relationships between those tables in the Data Model. But did you know that you can get Power Query to do this for other data sources too?

Now wait – don’t get excited. I’ve known about this for a while but not blogged about it because I don’t think it works all that well. You have to follow some very precise steps to make it happen and even then there are some problems. That said, I think we’re stuck with the current behaviour (at least for the time being) so I thought I might as well document it.

Consider the following Excel worksheet with two tables in it, called Dimension and Fact:

image

If you were to load these two tables into the Excel Data Model, you would probably want to create a relationship between the two tables based on the FruitID column. Here are the steps to use Power Query to create the relationship automatically:

  1. Click inside the Dimension table and then, on the Power Query tab in the Excel ribbon, click the From Table button to create a new query.
  2. When the Query Editor window opens, right click on the FruitID column and select Remove Duplicates.
    image
    Why are we doing this when there clearly aren’t any duplicate values in this column? The new step contains the expression
    Table.Distinct(Source, {"FruitID"})
    …and one of the side-effects of using Table.Distinct() is that it adds a primary key to the table. Yes, tables in Power Query can have primary keys – the Table.AddKey() function is another way of doing this. There’s a bit more information on this subject in my Power Query book, which I hope you have all bought!
  3. Click the Close & Load to.. button to close the Query Editor, and then choose the Only Create Connection option to make sure the output of the query is not loaded anywhere and the query is disabled, then click the Load button. (Am I the only person that doesn’t like this new dialog? I thought the old checkboxes were much simpler, although I do appreciate the new flexibility on where to put your Excel table output)
    image
  4. Click inside the Fact table in the worksheet, click the From Table button again and this time do load it into the Data Model.
  5. Next, in the Power Query tab in the Excel ribbon, click the Merge button. In the Merge dialog select Dimension as the first table, Fact as the second, and in both select the FruitID column to join on.
    image
  6. Click OK and the Query Editor window opens again. Click the Close & Load to.. button again, and load this new table into the Data Model.
  7. Open the Power Pivot window and you will see that not only have your two tables been loaded into the Data Model, but a relationship has been created between the two:
    image

What are the problems I talked about then? Well, for a start, if you don’t follow these instructions exactly then you won’t get the relationship created – it is much harder than I would like. There may be other ways to make sure the relationships are created but I haven’t found them yet (if you do know of an easier way, please leave a comment!). Secondly if you delete the two tables from the Data Model and delete the two Power Query queries, and then follow these steps again, you will find the relationship is not created. That can’t be right. Thirdly, I don’t like having to create a third query with the Merge, and would prefer it if I could just create two queries and define the relationship somewhere separately. With all of these issues I don’t think there’s any practical use for this functionality right now.

I guess the reason I think the ability to create relationships automatically is so important is because the one thing that the Excel Data Model/Power Pivot/SSAS Tabular sorely lacks is a simple way to script the structure of a model. Could Power Query and M one day be the modelling language that Marco asks for here? To be fair to the Power Query team this is not and should not be their core focus right now: Power Query is all about data acquisition, and this is data modelling. If this problem was solved properly it would take a lot of thought and a lot of effort. I would love to see it solved one day though.

You can download the sample workbook for this post here.

31 thoughts on “Create Your Own Relationships Between Tables In The Excel Data Model With Power Query

  1. Interesting post! I agree this could be useful for automation, though most other times I’ll prefer to create the relationship in Power Pivot.

    And I find the new Load To dialog less intuitive, but appreciate the additional flexibility and having it pointed out that I’m creating a connection in the workbook regardless of whether I load it.

    Your post reminds me of a tangentially related logical question I’ve been trying to answer for myself.

    I will often load from a particular SQL database two related tables, one with general ledger journal numbers and dates, the other with the journal detail (account numbers, amounts, transaction descriptions), and need only a particular date range and a particular account number range. I can filter each table on the respective attributes and load to the data model, but I end up loading a lot of extraneous data (i.e. journal records without the account range in question and detail records outside the date range).

    I know I can merge the tables but I am trying to avoid flattening. Although I’m not intimately familiar with SQL I’m sure there must be a relatively simple way to push the filtering to the database server. Is there a simple way to access that in Power Query?

    Meanwhile, I somehow missed that your book was released. That is exciting. I will be purchasing and hopefully will have a chance to dive into it ASAP.

    1. It would be possible in Power Query, but maybe not simple – you’d need to write some M code to do the filtering, which would not be difficult in itself, but you would have to be sure that you did it in such a way that Power Query generated SQL to do the filtering rather than grabbing all the data and doing the filtering on the desktop. When you get my book look for the section on “Query Folding” and you’ll see what I mean; this post covers similar ground: http://cwebbbi.wordpress.com/2014/06/04/join-conditions-in-power-query-part-2-events-in-progress-performance-and-query-folding/

      1. Just read the post and will check out that section of the book. This gets me going in the right direction. Thanks.

  2. it’s not just you, we don’t like the new load interface either, along with some of the other interface changes that is made every month
    as long as it is for the better I suppose we can’t complain

  3. Hi, thanks for the awesome post.

    But I still have some problems with defining relationships in Power Query/Power Pivot.

    I am getting data from the REST api on the web. It is something about 5 dimensional tables and 1 fact table.
    Everything worked fine, I loaded everything to Power Pivot and set the relationships there.

    But problem is when some error occurs during the refresh of the query. Then PQ will load a table to power pivot with one row containing the description of the error, so every relationship set in power pivot is lost.

    Because of this, the report is unusable, because even if I will refresh data again and everything will load correctly, I dont have relationships between tables.

    Is there some kind of workaround?

    Maybe something like

    let
    Source = try Json.Document(Web.Contents(“http://bla.service.net/api/Tickets/GetTickets”))
    in
    Result = if Source[HasError] then /*Generate table with same columns as Source with no data*/ else /* Continue working with this Source*/

    Thanks for your answer

    1. I see exactly what your problem is, and it’s a good question! My suggestion is this: use a try…otherwise statement to catch the error, and if the error is present return an empty table that has exactly the same structure as the one you are expecting. You can return an empty table using the Table.FromRows() function. This is a good topic for a future blog post I think!

      1. So I tried this and it seems to work:

        let
        ErrorTable = Table.FromRecords({[Name=”Error”, IDTicketSummaryState=1, Descritpion=”Error”]}),
        Source = try Web.Page(Web.Contents(“http://bla.service.net/Help/ResoursceModel?modelName=TicketSummaryEnum”)),
        #”Changed Type” = Table.TransformColumnTypes(Source[Value],{{“Name”, type text}, {“Value”, Int64.Type}, {“Description”, type text}}),
        #”Reorder Columns” = Table.ReorderColumns(#”Changed Type”,{“Value”, “Name”, “Description”}),
        #”Renamed Columns” = Table.RenameColumns(#”Reorder Columns”,{{“Value”, “IDTicketSummaryState”}}),
        #”Changed Type1″ = Table.TransformColumnTypes(#”Renamed Columns”,{{“IDTicketSummaryState”, Int64.Type}, {“Name”, type text}, {“Description”, type text}}),
        Result = if Source[HasError] then ErrorTable else #”Changed Type1″,
        in
        Result

        But I think this could use more explanation in your blog post, many people can have problem with that.

        Maybe this should be a part of PQ, if there is some kind of error in query. dont load it to data model or excel and show user error message. That way you dont screw up the data model and user knows what happened.

        Same problem is with autentication. Imagine, that you are loading data from Odata feed with autentication, PQ prompts user to type login and password, he clicks cancel and PQ loads error message to data model. So all relationships are gone.

        What do you think about this problem?

        Thank you for helping me

      2. Great, I’m glad to see it works! I agree, this is something that should definitely be part of Power Query. I’ll write a blog post about this when I get the chance…

        Regarding authentication: maybe you can get around this with a multiple query approach? Here’s my idea. You have query A that connects to the OData source and may fail if the user does not enter the login and password; this query should be disabled so it does not load to either a table in the worksheet or the Data Model. You have query B that returns an empty table using Table.FromRecords() (thinking about it some more, using #table might be even more elegant); this query is also disabled. Then you have query C which uses query A as a source, but if query A returns an error returns the output of query B; the output of query C loads to the data model. I wonder if this will solve the problem? I’m not sure but it’s worth trying.

  4. Tthank you for your blog and very interesting articles
    My question is if Power Query can create automatically relations from json or xml web services,
    which return an xml schema or in json return objects with arrays of children ?

      1. Rereading your first question, I don’t think I understood what you were asking initially. Are you asking how Power Query deals with web services that return XML and JSON payloads? If so, then it allows you to either navigate to the data that you want and/or flatten the XML or JSON down to a table.

  5. Hi Chris
    Great post

    One q:
    I have two major tables both with duplicates
    In table A I have the product and revenue per day
    In table B I have product by hour with a detailed info but no revenue, only sizes, etc
    So I’ve creates a new table C with the product id as unique to link between the first two
    But I am able to link A+C or B+C but not C (product name or id) A revenue and B with fabrics per example

    What am I missing?
    Happy new year

    1. If you drag Product from C onto the rows or columns axis of your PivotTable, you should be able to use it to analyse the revenue data from A at the same time as the data from B…?

      1. That what I do, I drag from C the product name, from A the rev and from B the fabric or any other text or numerical and I get the error that between A and B there is not relationship and I should create one…
        I can send you the real xlsx if needed

        Thanks

  6. Hi Chris,

    Thanks for your blog. I am a newbie in the PQ world. I have one question.

    Which is a more efficient way to go ahead? Defining a lookup function to get data from a small table (Date Table) to a big table? OR is merging the 2 tables a more efficient way forward?

    TIA

      1. If you’re loading two tables into Power Pivot, you should probably keep them as two tables and create a relationship between them. However, there are scenarios where a single table may be better, so it depends…

  7. Hi Chris, is there a way to do this (create a relationship between two odata tables) so that the query will work when the dashboard is viewed via a SharePoint Online Excel Service Web Part?

  8. i have a problem after removing duplicates before using power query, power query merge process generate duplicates for me which messes up my data. is there something i am doing wrong because i follow the necessary steps

Leave a Reply to Chris WebbCancel reply