Defining Relationships Between Entities In The Common Data Model To Automatically Create Relationships In Power BI

Following on from my last post on attaching manually-created Common Data Model folders in Power BI dataflows, I was asked whether defining relationships between entities in the model.json file of the CDM folder results in relationships being created between tables in a Power BI dataset. I’ve just tested it, and I can confirm it does.

Consider a CDM folder that contains two entities, Sales and Fruit. The Sales entity contains monthly sales data:

image

…and the Fruit entity contains a list of fruit sold:

image

Now –and this is important – let’s say you have defined the entities in the model.json so that the Sales entity  has attributes Month, Product and Sales:

image

…and the Fruit entity has a single attribute called Fruit:

image

Because the Sales entity and the Fruit entity use different names for the attributes that contain the names of the fruit sold, when the two entities are loaded into Power BI no relationships are created between the two tables. There’s no way that Power BI can guess that there should be a relationship between the Product and Fruit columns based on these names.

However, if you add a relationship definition (as documented here) to the model.json file like so:

image

Then, when you load the two entities as tables in the same Power BI dataset, you get a relationship created automatically:

image

As far as I can see it does this by adding a key definition to the Sales table in a similar way to the way the Table.AddKeyM M function/Remove Duplicates transformation does, as I blogged here.

Now if we could add DAX calculations (calculated columns and especially measures) to the definition of the model.json file, so they were automatically created when entities were imported from a dataflow, that would be really cool. I don’t see why this would not be possible because you could store all kinds of useful information – such as the DAX for the calculations – in the metadata record of an M query (you can access this yourself using the Value.Metadata M function) loaded into a dataset, and the Power BI engine could read it when the table is loaded and create the calculations from this.

Attaching Manually-Created Common Data Model Folders In Power BI Dataflows

Most of the focus on Power BI dataflows so far has been on the use of Power Query Online to load data from external data sources. However, when you create a dataflow you also have the option to attach an existing Common Data Model (CDM) folder stored in Azure Data Lake Storage Gen2:

image

The documentation has some very detailed information on this subject: you can read how to configure Power BI to use your own Azure Data Lake Storage Gen2 account here, and you can read about how to attach an existing CDM folder here.

The interesting thing for me is that this means that any external service – whether it’s a Microsoft service, a third party service or something you build yourself – can output data into a folder in Azure Data Lake Storage Gen2, and so long as it’s in the right format you can expose this data as a dataflow in Power BI. So, I thought, why not create some data manually, put it in the cloud and see if I can get Power BI to attach it as a dataflow?

It turns out it isn’t too difficult to do, and what’s more when you look at the documentation for the Common Data Model and specifically the documentation for the format of the model.json file there are all kinds of interesting features to check out.

Consider the following folder containing three files, all created manually and uploaded to my Azure Data Lake Storage Gen2 account using Azure Storage Explorer:

image

The two csv files are pretty simple:

image

image

The model.json file, which contains the metadata for the dataflow, is also not that complicated:

image

This model.json file describes a single entity called Sales with three columns: Month, Product and Sales. It also specifies that the data for this entity is stored in two partitions and these partitions are linked to the two csv files above (I’m pretty sure this is how incremental refresh is handled in dataflows too). I like the idea of another application being able to save new csv files to a folder and, after updating the model.json file to add a new partition, this data being appended onto the table of data returned by the entity.

It took a few goes to attach the folder to a dataflow, mostly because when it fails the error messages (at least at the time of writing) are completely unhelpful. Two things I ran into are:

  • Any spaces in folder or file paths need to be url encoded, ie replaced with %20
  • You really do need to grant permissions on the folder to the Power BI users who are going to access the dataflow. RTFM.

After that, the dataflow is visible in the Power BI service:

image

…and the data from the Sales entity can be loaded into your dataset in Power BI Desktop:

image

image

As I said, there’s more for me to research here. For example, looking in the documentation, you can create two types of entities – Local and Referenced (are the latter the same as Linked Entities?) – and you can also define relationships between entities (do these become relationships between tables in Power BI?). I think this opens up a lot of new possibilities for close integration between Power BI and other services!

%d bloggers like this: