Attaching Manually-Created Common Data Model Folders In Power BI Dataflows

Most of the focus on Power BI dataflows so far has been on the use of Power Query Online to load data from external data sources. However, when you create a dataflow you also have the option to attach an existing Common Data Model (CDM) folder stored in Azure Data Lake Storage Gen2:

image

The documentation has some very detailed information on this subject: you can read how to configure Power BI to use your own Azure Data Lake Storage Gen2 account here, and you can read about how to attach an existing CDM folder here.

The interesting thing for me is that this means that any external service – whether it’s a Microsoft service, a third party service or something you build yourself – can output data into a folder in Azure Data Lake Storage Gen2, and so long as it’s in the right format you can expose this data as a dataflow in Power BI. So, I thought, why not create some data manually, put it in the cloud and see if I can get Power BI to attach it as a dataflow?

It turns out it isn’t too difficult to do, and what’s more when you look at the documentation for the Common Data Model and specifically the documentation for the format of the model.json file there are all kinds of interesting features to check out.

Consider the following folder containing three files, all created manually and uploaded to my Azure Data Lake Storage Gen2 account using Azure Storage Explorer:

image

The two csv files are pretty simple:

image

image

The model.json file, which contains the metadata for the dataflow, is also not that complicated:

image

This model.json file describes a single entity called Sales with three columns: Month, Product and Sales. It also specifies that the data for this entity is stored in two partitions and these partitions are linked to the two csv files above (I’m pretty sure this is how incremental refresh is handled in dataflows too). I like the idea of another application being able to save new csv files to a folder and, after updating the model.json file to add a new partition, this data being appended onto the table of data returned by the entity.

It took a few goes to attach the folder to a dataflow, mostly because when it fails the error messages (at least at the time of writing) are completely unhelpful. Two things I ran into are:

  • Any spaces in folder or file paths need to be url encoded, ie replaced with %20
  • You really do need to grant permissions on the folder to the Power BI users who are going to access the dataflow. RTFM.

After that, the dataflow is visible in the Power BI service:

image

…and the data from the Sales entity can be loaded into your dataset in Power BI Desktop:

image

image

As I said, there’s more for me to research here. For example, looking in the documentation, you can create two types of entities – Local and Referenced (are the latter the same as Linked Entities?) – and you can also define relationships between entities (do these become relationships between tables in Power BI?). I think this opens up a lot of new possibilities for close integration between Power BI and other services!

12 responses

  1. Pingback: Attaching Manually-Created Common Data Model Folders In Power BI Dataflows — Chris Webb’s BI Blog | MS Excel | Power Pivot | DAX | SSIS |SQL

  2. I’m curious to know how the model.json file will work in conjunction with parquet files considering they also store metadata, e.g., field types, inside them.

      • Fingers crossed. However, I’d be surprised if it’s soon because one big limitation of Power Query for the past few years is that it cannot natively read parquet files. I assume they’ll either: (a) refactor the existing Mashup engine; or (b) fold M to a Spark proxy, e.g., Databricks. Either of this options ain’t easy.

      • I don’t think has anything to do with the Power Query engine – with an attached CDM folder there’s no need to use Power Query Online, and in Power BI Desktop my guess is that you’ll be going through a standard API to get data from the dataflow, however it’s created, rather than reading the parquet files directly.

  3. Pingback: Attaching Manually-Created Common Data Model Folders In Power BI Dataflows — Chris Webb’s BI Blog – SutoCom Solutions

  4. If Microsoft separates the SQL Serve storage from the engine and the storage can use Azure Data Lake with CDM metadata then you could go hog wild with SQL over CDM. What a dream. Chris, have you tried to define the relationship metadata in JSON and seen if Power Query can understand it? I doubt it at this point. I think only PowerApps CDS at this point fully utilizes the entire specification.

    • Suhail – With Azure SQL Data Warehouse, we do have PolyBase which can access remote data in ADLS Gen2 via an external table definition. Remote queries via data virtualization like that don’t perform optimally because the SQL engine has to pull full data files over in order to perform a where statement which retrieves only partial data from a file (i.e., there’s no compute engine natively in ADLS Gen2 currently to perform data pruning or predicate pushdown before the data goes over to the SQL database). Having said all that, the functionality is available so theoretically you can already start going a teensy bit wild with SQL over CDM. =) (For the sake of completeness I feel the need to say…PolyBase is strongly encouraged as the way to *load* data to Azure SQL DW, but use PolyBase with caution for remote queries for the reasons I mentioned above. There’s a lot more to it than that, but that’s the gist.)

      • I forgot about Polybase. That does allow SQL over CDM. I had a project using PowerApps and I was shuttling data between the CDM and SQL Server via Microsoft Flow because I had to do some complex calculations that could easily be done with the full breadth of T-SQL. Yes, there’s also a huge comfort level and proficiency I’ve built up with SQL where as using Power Query, Flow and the PowerApps business rules engine could have solved but would’ve taken longer to puzzle it out. Doing reporting off of CDM via Polybase, Power Query or staging the data using Azure Data Factory I can handle and fully agree staging the data in SQL is the way to go for data warehousing. It’s the complex read/write business logic where I have trouble but then there’s such a big ecosystem around CDM now that I want to be leveraging it for reporting and application development.

  5. Pingback: Defining Relationships Between Entities In The Common Data Model To Automatically Create Relationships In Power BI « Chris Webb's BI Blog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: