Create Your Own Relationships Between Tables In The Excel Data Model With Power Query

You probably know that, when you are importing data from multiple tables in SQL Server into the Excel Data Model in Excel 2013 using Power Query, Power Query will automatically create relationships between those tables in the Data Model. But did you know that you can get Power Query to do this for other data sources too?

Now wait – don’t get excited. I’ve known about this for a while but not blogged about it because I don’t think it works all that well. You have to follow some very precise steps to make it happen and even then there are some problems. That said, I think we’re stuck with the current behaviour (at least for the time being) so I thought I might as well document it.

Consider the following Excel worksheet with two tables in it, called Dimension and Fact:


If you were to load these two tables into the Excel Data Model, you would probably want to create a relationship between the two tables based on the FruitID column. Here are the steps to use Power Query to create the relationship automatically:

  1. Click inside the Dimension table and then, on the Power Query tab in the Excel ribbon, click the From Table button to create a new query.
  2. When the Query Editor window opens, right click on the FruitID column and select Remove Duplicates.
    Why are we doing this when there clearly aren’t any duplicate values in this column? The new step contains the expression
    Table.Distinct(Source, {"FruitID"})
    …and one of the side-effects of using Table.Distinct() is that it adds a primary key to the table. Yes, tables in Power Query can have primary keys – the Table.AddKey() function is another way of doing this. There’s a bit more information on this subject in my Power Query book, which I hope you have all bought!
  3. Click the Close & Load to.. button to close the Query Editor, and then choose the Only Create Connection option to make sure the output of the query is not loaded anywhere and the query is disabled, then click the Load button. (Am I the only person that doesn’t like this new dialog? I thought the old checkboxes were much simpler, although I do appreciate the new flexibility on where to put your Excel table output)
  4. Click inside the Fact table in the worksheet, click the From Table button again and this time do load it into the Data Model.
  5. Next, in the Power Query tab in the Excel ribbon, click the Merge button. In the Merge dialog select Dimension as the first table, Fact as the second, and in both select the FruitID column to join on.
  6. Click OK and the Query Editor window opens again. Click the Close & Load to.. button again, and load this new table into the Data Model.
  7. Open the Power Pivot window and you will see that not only have your two tables been loaded into the Data Model, but a relationship has been created between the two:

What are the problems I talked about then? Well, for a start, if you don’t follow these instructions exactly then you won’t get the relationship created – it is much harder than I would like. There may be other ways to make sure the relationships are created but I haven’t found them yet (if you do know of an easier way, please leave a comment!). Secondly if you delete the two tables from the Data Model and delete the two Power Query queries, and then follow these steps again, you will find the relationship is not created. That can’t be right. Thirdly, I don’t like having to create a third query with the Merge, and would prefer it if I could just create two queries and define the relationship somewhere separately. With all of these issues I don’t think there’s any practical use for this functionality right now.

I guess the reason I think the ability to create relationships automatically is so important is because the one thing that the Excel Data Model/Power Pivot/SSAS Tabular sorely lacks is a simple way to script the structure of a model. Could Power Query and M one day be the modelling language that Marco asks for here? To be fair to the Power Query team this is not and should not be their core focus right now: Power Query is all about data acquisition, and this is data modelling. If this problem was solved properly it would take a lot of thought and a lot of effort. I would love to see it solved one day though.

You can download the sample workbook for this post here.

Using Slicer Selections In The CubeSet Function

I had an interesting challenge from a customer yesterday – one of those problems that I’d known about for a long time but never got round to working out the solution for…

Consider the following PivotTable, based on a PowerPivot model using Adventure Works data, in Excel 2010:


It shows the top 10 products by the measure Sum of Sales. There are two slicers, and the top 10 shown in the PivotTable reflects the selections made in the slicers. All of this works fine. But what if you want to use Excel cube functions to do the same thing? You can write the MDX for the top 10 products quite easily and use it in the CubeSet() function in your worksheet, but how can you get your MDX set expression to respect the selection made in the slicers?

The solution to this problem is very similar to the trick I showed here – finding the selected items in a slicer is not easy! Here are the steps I followed to do it:

  • Add the slicers for EnglishOccupation and CalendarYear to a new worksheet
  • Go to Slicer Settings and uncheck the box for “Visually indicate items with no data”
  • Add two new PivotTables to the worksheet. Connect one to the EnglishOccupation slicer and put EnglishOccupation on rows; connect the other to the CalendarYear slicer and put CalendarYear on rows.
  • Use the OLAPPivotTableExtensions add-in (which you can download here) to add new MDX calculated measures to each PivotTable. For the EnglishOccupation PivotTable call the measure SelectedOccupations and use the following MDX:
    SetToStr(Except(Axis(0), {[Customer].[EnglishOccupation].[All]}))
    This expression does the following: it uses the Axis() function to find the set of members selected on what Excel thinks of as the rows axis in the PivotTable (actually the MDX columns axis), then uses Except() to remove the All Member from the hierarchy (which Excel uses for the Grand Totals) and then uses SetToStr() to take that set and return the string representation of it. Do the same thing for the PivotTable showing CalendarYear too, calling the calculated measure SelectedYears; the MDX in this case is:
    SetToStr(Except(Axis(0), {[Date].[CalendarYear].[All]}))
    This is what the EnglishOccupation PivotTable should look like:
  • Next, to make things easy, use Excel formulas to get the values from the top cell inside each PivotTable into cells elsewhere in the worksheet, and give these cells the names SelectedOccupations and SelectedYears.
  • Then enter a CubeSet() function into a new cell using the following formula:
    "PowerPivot Data",
    Sum(" & SelectedOccupations & " * " & SelectedYears & ",[Measures].[Sum of Sales])
    "Top 10 Set")
    What this does is use the TopCount() function to find the top 10 Products, and in the third parameter of this function which is the numeric expression to find the top 10 by, it crossjoins the two sets of selected occupations and selected years and then sums the output of the crossjoin by the measure [Sum of Sales].
  • Last of all, build your report using the Excel cube functions as normal, using the CubeRankedMember() function to get each item from the top 10 set created in the previous step.


You can download my sample workbook here.

The bad news about this technique is that it doesn’t work in Excel 2013 and Power Pivot. It’s no longer possible to create MDX calculated measures on Power Pivot models in Excel 2013, alas. It will work if you’re using any version of Excel from 2007 on against Analysis Services and, as I show here, Excel 2010 and PowerPivot. If you are using Power Pivot and Excel 2013 it might be possible to create a DAX measure to do the same as the MDX I’ve used here (I’m wondering if the technique Jason describes here will work). It would certainly be possible to use CubeRankedMember() to find each item selected in the slicer, as Erik Svensen shows here, and then use Excel formulas to find the MDX unique name for each selected member and concatenate these unique names to create the set expression that my calculated measures return, but that’s a topic for another post. This really should be a lot easier than it is…

Point-In-Time Dimension Reporting In DAX

Before I start, I have to state that the technique shown in this post isn’t mine but was developed by my colleague Andrew Simmans, who has very kindly allowed me to blog about it.

Over the last few months I’ve been working on an SSAS Tabular project that has not only presented some interesting modelling challenges, but has shown how DAX can offer some new and interesting solutions to these challenges. Consider the following scenario: a supermarket sells products, and we have a fact table showing sales of products by day. Here’s some sample data:


To complicate matters, each product has one product manager but product managers for particular products change from time to time. Normally this might be solved by adding the product manager name to the Product dimension table and implementing a Type 2 Slowly Changing Dimension. In this case, though, we want something slightly different: instead of seeing sales attributed to the product manager who was in charge of the product at the time of the sale, and therefore seeing sales for the same product attributed to different product managers on different dates, we want to attribute all sales for a product to a single product manager but be able to use a second date dimension to be able to determine the point in time, and therefore the product manager in charge of each product at that point in time, that we want to report as of. To put it another way, we want to be able to find the state of a dimension on any given date and use that version of the dimension to do our analysis.

For example, we have the following table showing which product manager was in charge of each product at any given point in time:


Between January 1st 2013 and January 3rd 2013 Jim was the product manager for Orange, but from January 4th 2013 onwards Rob took over as product manager for Oranges; Fred was the product manager for Apples the whole time. We want a PivotTable that looks like this when we choose to report as of January 2nd 2013:


Notice how Jim is shown as the product manager for Oranges. If we wanted to report using the managers as of January 5th 2013, we would want to see Rob shown as the product manager for Oranges like so:


The solution to this problem involves a combination of two DAX techniques that have already been blogged about quite extensively and which I’d encourage you to read up on:

  • Many-to-many relationships, in this case the solution developed by Gerhard Brueckl, described on his blog here.
  • ‘Between’ date filters, which I wrote about recently but which Alberto has recently improved on in his must-read white paper here.

Here are the table relationships I’ve used for the sample scenario:


I’ve added a second date table called ReportingDate which contains the same rows as the Date table shown above; note that it has no relationship with any other table.

This problem is very similar to a many-to-many relationship in that a product can have many managers across time, and a manager can have many products. Indeed we could model this as a classic many-to-many relationship by creating a bridge table with one row for each valid combination of product and manager for each possible reporting date; on my project, however, this was not a viable solution because it would have resulted in a bridge table with billions of rows in it. Therefore, instead of joining the ReportingDate table directly to the ProductManager table, we can instead filter ProductManager using the between date filter technique.

Here’s the DAX of the Sum of Sales measure used in the PivotTables show above:

Sum of Sales:=





FILTER(ProductManager, MIN(ReportingDate[ReportingDate])>=ProductManager[StartDate] 


IF(ISBLANK(ProductManager[EndDate]), TRUE(), 






This is not necessarily the best way to write the code from a performance point of view but it’s the most readable – if you need better performance I recommend you read Alberto’s white paper. What I’m doing is this:

  • Only return a value if a single reporting date is selected
  • Filter the ProductManager table so only the rows where the selected reporting date is between the start date and the end date are returned, ie we only get the rows where a manager was in charge of a product on the reporting date
  • Use the filtered ProductManager table to filter the main fact table using the Calculate() function, in exactly the same way that you would with a many-to-many relationship

You can download my sample workbook here.

A New Events-In-Progress DAX Pattern

I’ve been working on a very complex SSAS Tabular implementation recently, and as a result I’ve learned a few new DAX tricks. The one that I’m going to blog about today takes me back to my old favourite, the events-in-progress problem. I’ve blogged about it a lot of times, looking at solutions for MDX and DAX (see here and here), and for this project I had to do some performance tuning on a measure that uses a filter very much like this.

Using the Adventure Works Tabular model, the obvious way of finding the number of Orders on the Internet Sales table that are open on any given date (ie where the Date is between the dates given in the Order Date and the Ship Date column) is to write a query something like this:



    VALUES ( 'Date'[Date] ),



        COUNTROWS ( 'Internet Sales' ),

        FILTER( 'Internet Sales', 'Internet Sales'[Ship Date] > 'Date'[Date] ),

        FILTER( 'Internet Sales', 'Internet Sales'[Order Date] <= 'Date'[Date] )



ORDER BY 'Date'[Date]

On my laptop this executes in around 1.9 seconds on a cold cache. However, after a bit of experimentation, I found the following query was substantially faster:



    VALUES ( 'Date'[Date] ),




            'Internet Sales',



                    , 'Internet Sales'[Order Date]

                    , DATEADD('Internet Sales'[Ship Date],-1, DAY))

                , [Date]

                , 'Date'[Date]





ORDER BY 'Date'[Date]

On a cold cache this version executes in just 0.2 seconds on my laptop. What’s different? In the first version of the calculation the FILTER() function is used to find the rows in Internet Sales where the Order Date is less than or equal to the Date on rows, and where the Ship Date is greater than the Date. This is the obvious way of solving the problem. In the new calculation the DATESBETWEEN() function is used to create a table of dates from the Order Date to the day before the Ship Date for each row on Internet Sales, and the CONTAINS() function is used to see if the Date we’re interested in appears in that table.

I’ll be honest and admit that I’m not sure why this version is so much faster, but if (as it seems) this is a generally applicable pattern then I think this is a very interesting discovery.

Thanks to Marco, Alberto and Marius for the discussion around this issue…

UPDATE: Scott Reachard has some some further testing on this technique, and found that the performance is linked to the size of the date ranges. So, the shorter your date ranges, the faster the performance; if you have large date ranges, this may not be the best performing solution. See

UPDATE: Alberto has done a lot more research into this problem, and come up with an even faster solution. See:

Comments And Descriptions In DAX

With my Technitrain hat on I’m sitting in on Marco’s Advanced DAX course in London today, and the question of comments in DAX came up – which reminded me that this is something I’ve been meaning to blog about. DAX as a language supports comments, but unfortunately it’s not possible to add comments inside a DAX measure or calculated column expression in either PowerPivot or SSAS Tabular right now (which is where they’re most needed – I hope this changes in the future). That said, there are some other things you can do to add textual explanations and descriptions to your DAX measure code.

Before we get onto the workarounds, a quick word about comments in DAX. These can only be used in DAX queries, and the types of comment supported are the same as in MDX: double-dashes and double-forward-slashes for single line comments, and forward-slash-asterisk to start a multi-line comment and asterisk-forward-slash to close a multi-line comment. Here’s an example:

--single line comment

//another single line comment

/*a multi-line


evaluate table1


What can be done with measures though? After all, that’s where the most complex DAX is usually written.

First of all, you can add a description to a measure by right-clicking on it in the measure grid and selecting Description:



Unfortunately this description is not easily accessible to end users anywhere (it would be great if it appeared as a tooltip in a PivotTable, for example) but it can be seen in an Excel worksheet by running a DMV query. DMV queries can be run in Excel 2013 in the same way as DAX queries, using a query table as described here; the DMV query to use is:


measure_name as [Measure Name], [description], measure_is_visible 

from $system.mdschema_measures



Unfortunately all hidden and implicit measures are returned, and even when the table is filtered so that only measure_is_visible=true there are still a lot of measures that probably shouldn’t be shown.

Similarly, descriptions can be added to any column (calculated or not) in your model, again by right-clicking on it and selecting Description.


This description can be displayed in the worksheet using the following DMV query:


hierarchy_name as [Column Name], [description] as [Description] 

from $system.mdschema_hierarchies

where cube_name='model'



You can also write text direct to cells in the measure grid too. When I first saw a customer do this I was worried that it might not be supported, but I’ve been told that it is; so long as you don’t use the =: used for defining measures then you should be ok.


This is probably the best way to add comments to your code, if only because it’s the most visible to anyone looking at your PowerPivot/SSAS Tabular model. Of course, for it to be effective you’ll need to have a system for arranging your measures in the measure grid; in “SQL Server Analysis Services 2012: The BISM Tabular Model”, Marco, Alberto and I recommended that you arrange all your measures in the top-left hand corner of the measure grid and I think that’s still a good idea, but the use of text in cells to create headings for groups of measures as well as descriptions can help a lot too.

Accumulating Data In An Excel Table Using Data Explorer (Power Query) and PowerPivot

NOTE: This post was written before Data Explorer was renamed as Power Query. All of the content is still relevant to Power Query.

One of the first questions I get asked after showing someone PowerPivot for the first time is “Can I add new data to a PowerPivot table that already has data in it?”. Out of the box, of course, the answer is no: when you process a table in PowerPivot you have to reload all the data from your data source, you can’t just append new data (unless you’re using copy/paste to load data, which isn’t a good idea). However, there are a lot of self-service BI scenarios where the ability to do this would be extremely useful: for example, you might want to scrape stock quotes from a web page every day and then, in an Excel workbook, accumulate that data in a table so you can analyse historical stock prices with PowerPivot. I ran into a scenario very much like this last week and I thought that Data Explorer should be able to help here. It can, but it’s not obvious how to do it – hence this blog post!

Here’s a super-simple example of how to accumulate data in a table then. Let’s start with a csv file that contains the following data:


It’s straightforward to import this data into Excel using Data Explorer and the ‘From csv’ data source:



Here’s the code that Data Explorer generates:


    Source = Csv.Document(File.Contents("C:\InputData.csv")),

    FirstRowAsHeader = Table.PromoteHeaders(Source),

    ChangedType = Table.TransformColumnTypes(FirstRowAsHeader,

                              {{"Product", type text}, {"Sales", type number}})




Now, let’s imagine that you want to keep the data from this file in Excel and every time you click Refresh in Data Explorer you add the data from the file onto the end of the existing data you’ve already captured. The first thing you’ll probably want to do in this scenario is add a new column to the data that gives the date and time that the data was loaded, and you can do that quite easily in Data Explorer using the DateTimeZone.UtcNow() function as follows:

Table.AddColumn(ChangedType, “Load Date”, each DateTimeZone.UtcNow())

Data Explorer has functionality to append the data from one query onto the end of another query, but the problem you have to solve now is that when you click Refresh you want the new data to be appended onto the end of the data that has already been collected. It’s a recursive scenario not unlike the one I grappled with here. The solution to this problem is to first of all load the data into the PowerPivot (ie what we should be calling the Excel Data Model now) by clicking on the Load To Data Model link in the Data Explorer query pane:


Then, on a new sheet, create an Excel query table that returns all the data from the PowerPivot table that you’ve just loaded data into. Kasper shows how to do this here; there’s no need for any special DAX, you just need to connect to the PowerPivot table in the Existing Connections dialog:



At this point you should have two tables on two sheets that contain the same data. The next step is to modify the original Data Explorer query so that it contains a new step that appends data from the table you’ve just created (ie the table getting the data from PowerPivot) onto the data from the csv file. This can be done with three new steps, first to get the data from the new Excel table:


Then to make sure the Load Date is treated as a DateTimeZone type:

Table.TransformColumnTypes(GetExistingData,{{“Load Date”, type datetimezone}})

Then finally to combine the two tables:


Now, whenever you Refresh your Data Explorer query, you will see the data from the csv file appended to the data that has already been loaded:



Here’s the complete code:


    Source = Csv.Document(File.Contents("C:\InputData.csv")),

    FirstRowAsHeader = Table.PromoteHeaders(Source),

    ChangedType = Table.TransformColumnTypes(FirstRowAsHeader,

                  {{"Product", type text}, {"Sales", type number}}),

    InsertedCustom = Table.AddColumn(ChangedType, "Load Date", each DateTimeZone.UtcNow()),

    Custom1 = Excel.CurrentWorkbook(){[Name="Table_Input_Data"]}[Content],

    ChangedType1 = Table.TransformColumnTypes(Custom1,{{"Load Date", type datetimezone}}),

    Custom2 = Table.Combine({ChangedType1,InsertedCustom})



Now as I said, this is just a super-simple example and in the real world you’d need extra functionality to do things like delete rows you’ve already loaded and so on; but that’s all doable I think. It’s also worth mentioning that I encountered some strange errors and behaviour when implementing this, partly due to Data Explorer still being in preview I guess, so if you want to recreate this query you’ll need to follow my instructions exactly.

You can download the sample workbook here, and the csv file here.

PowerPivot Workbook Size Optimizer

Browsing through my RSS feeds this morning, I saw a new download on the Microsoft site: an Excel addin (Excel 2013 only, I think) called the PowerPivot Workbook Size Optimizer. You can get it here:

Here’s the blurb from the site:

The Workbook Size optimizer for Excel can better compress data inside workbooks that use PowerPivot or PowerView if this data comes from external data sources. The best size compression can be achieved for workbooks based on SQL Server databases and there are a few tricks we can do for other SQL datasources as well. The optimizer will install as an add in to excel and will provide you with a nice wizard to better compress the size of your workbook. Using the optimizer you can often get more than 1,000,000 rows datasets in a workbook under 10 MB, share it in SharePointOnline and interact withit using the Excel Web App in any browser.

Here’s a screenshot:


Despite a testing a few models with data from Adventure Works I couldn’t get it to suggest any changes (it didn’t spot that I had imported a column containing binary data, hmmm) but I guess it needs more testing on larger/more diverse data sources. Maybe there’s a blog post coming from the PowerPivot team coming soon explaining how to use this?

UPDATE: after playing around with it a bit more, I was able to get it to suggest some changes to tables. Marco has some more details:

And there’s a white paper on the rules that it uses: