Most of the focus on Power BI dataflows so far has been on the use of Power Query Online to load data from external data sources. However, when you create a dataflow you also have the option to attach an existing Common Data Model (CDM) folder stored in Azure Data Lake Storage Gen2:
The documentation has some very detailed information on this subject: you can read how to configure Power BI to use your own Azure Data Lake Storage Gen2 account here, and you can read about how to attach an existing CDM folder here.
The interesting thing for me is that this means that any external service – whether it’s a Microsoft service, a third party service or something you build yourself – can output data into a folder in Azure Data Lake Storage Gen2, and so long as it’s in the right format you can expose this data as a dataflow in Power BI. So, I thought, why not create some data manually, put it in the cloud and see if I can get Power BI to attach it as a dataflow?
It turns out it isn’t too difficult to do, and what’s more when you look at the documentation for the Common Data Model and specifically the documentation for the format of the model.json file there are all kinds of interesting features to check out.
Consider the following folder containing three files, all created manually and uploaded to my Azure Data Lake Storage Gen2 account using Azure Storage Explorer:
The two csv files are pretty simple:
The model.json file, which contains the metadata for the dataflow, is also not that complicated:
This model.json file describes a single entity called Sales with three columns: Month, Product and Sales. It also specifies that the data for this entity is stored in two partitions and these partitions are linked to the two csv files above (I’m pretty sure this is how incremental refresh is handled in dataflows too). I like the idea of another application being able to save new csv files to a folder and, after updating the model.json file to add a new partition, this data being appended onto the table of data returned by the entity.
It took a few goes to attach the folder to a dataflow, mostly because when it fails the error messages (at least at the time of writing) are completely unhelpful. Two things I ran into are:
- Any spaces in folder or file paths need to be url encoded, ie replaced with %20
- You really do need to grant permissions on the folder to the Power BI users who are going to access the dataflow. RTFM.
After that, the dataflow is visible in the Power BI service:
…and the data from the Sales entity can be loaded into your dataset in Power BI Desktop:
As I said, there’s more for me to research here. For example, looking in the documentation, you can create two types of entities – Local and Referenced (are the latter the same as Linked Entities?) – and you can also define relationships between entities (do these become relationships between tables in Power BI?). I think this opens up a lot of new possibilities for close integration between Power BI and other services!