Attaching Manually-Created Common Data Model Folders In Power BI Dataflows

Chris Webb Dataflows, Power BI March 13, 2019 2 Minutes

Most of the focus on Power BI dataflows so far has been on the use of Power Query Online to load data from external data sources. However, when you create a dataflow you also have the option to attach an existing Common Data Model (CDM) folder stored in Azure Data Lake Storage Gen2:

The documentation has some very detailed information on this subject: you can read how to configure Power BI to use your own Azure Data Lake Storage Gen2 account here, and you can read about how to attach an existing CDM folder here.

The interesting thing for me is that this means that any external service – whether it’s a Microsoft service, a third party service or something you build yourself – can output data into a folder in Azure Data Lake Storage Gen2, and so long as it’s in the right format you can expose this data as a dataflow in Power BI. So, I thought, why not create some data manually, put it in the cloud and see if I can get Power BI to attach it as a dataflow?

It turns out it isn’t too difficult to do, and what’s more when you look at the documentation for the Common Data Model and specifically the documentation for the format of the model.json file there are all kinds of interesting features to check out.

Consider the following folder containing three files, all created manually and uploaded to my Azure Data Lake Storage Gen2 account using Azure Storage Explorer:

The two csv files are pretty simple:

The model.json file, which contains the metadata for the dataflow, is also not that complicated:

This model.json file describes a single entity called Sales with three columns: Month, Product and Sales. It also specifies that the data for this entity is stored in two partitions and these partitions are linked to the two csv files above (I’m pretty sure this is how incremental refresh is handled in dataflows too). I like the idea of another application being able to save new csv files to a folder and, after updating the model.json file to add a new partition, this data being appended onto the table of data returned by the entity.

It took a few goes to attach the folder to a dataflow, mostly because when it fails the error messages (at least at the time of writing) are completely unhelpful. Two things I ran into are:

Any spaces in folder or file paths need to be url encoded, ie replaced with %20
You really do need to grant permissions on the folder to the Power BI users who are going to access the dataflow. RTFM.

After that, the dataflow is visible in the Power BI service:

…and the data from the Sales entity can be loaded into your dataset in Power BI Desktop:

As I said, there’s more for me to research here. For example, looking in the documentation, you can create two types of entities – Local and Referenced (are the latter the same as Linked Entities?) – and you can also define relationships between entities (do these become relationships between tables in Power BI?). I think this opens up a lot of new possibilities for close integration between Power BI and other services!

Published by Chris Webb

My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel. View all posts by Chris Webb

Published March 13, 2019

38 thoughts on “Attaching Manually-Created Common Data Model Folders In Power BI Dataflows”

Pingback: Attaching Manually-Created Common Data Model Folders In Power BI Dataflows — Chris Webb’s BI Blog | MS Excel | Power Pivot | DAX | SSIS |SQL
Simon Nuss says:

March 13, 2019 at 1:35 pm

I’m curious to know how the model.json file will work in conjunction with parquet files considering they also store metadata, e.g., field types, inside them.

Loading...

Reply
1. Chris Webb says:
  
  March 13, 2019 at 1:39 pm
  
  Good question – I guess we’ll see when parquet files are supported, which will hopefully be soon.
  
  Loading...
  
  Reply
  1. Simon Nuss says:
    
    March 13, 2019 at 1:53 pm
    
    Fingers crossed. However, I’d be surprised if it’s soon because one big limitation of Power Query for the past few years is that it cannot natively read parquet files. I assume they’ll either: (a) refactor the existing Mashup engine; or (b) fold M to a Spark proxy, e.g., Databricks. Either of this options ain’t easy.
    
    Loading...
  2. Chris Webb says:
    
    March 13, 2019 at 1:56 pm
    
    I don’t think has anything to do with the Power Query engine – with an attached CDM folder there’s no need to use Power Query Online, and in Power BI Desktop my guess is that you’ll be going through a standard API to get data from the dataflow, however it’s created, rather than reading the parquet files directly.
    
    Loading...
Pingback: Attaching Manually-Created Common Data Model Folders In Power BI Dataflows — Chris Webb’s BI Blog – SutoCom Solutions
suhail ali says:

March 13, 2019 at 9:20 pm

If Microsoft separates the SQL Serve storage from the engine and the storage can use Azure Data Lake with CDM metadata then you could go hog wild with SQL over CDM. What a dream. Chris, have you tried to define the relationship metadata in JSON and seen if Power Query can understand it? I doubt it at this point. I think only PowerApps CDS at this point fully utilizes the entire specification.

Loading...

Reply
1. Chris Webb says:
  
  March 14, 2019 at 10:03 am
  
  Defining relationships is the next thing I’m going to test!
  
  Loading...
  
  Reply
  1. Chris Webb says:
    
    March 16, 2019 at 11:00 am
    
    I’ve just tested it, and relationships defined in the model.json file do indeed become relationships in Power BI. Blog post coming soon!
    
    Loading...
2. Melissa Coates says:
  
  March 14, 2019 at 11:47 pm
  
  Suhail – With Azure SQL Data Warehouse, we do have PolyBase which can access remote data in ADLS Gen2 via an external table definition. Remote queries via data virtualization like that don’t perform optimally because the SQL engine has to pull full data files over in order to perform a where statement which retrieves only partial data from a file (i.e., there’s no compute engine natively in ADLS Gen2 currently to perform data pruning or predicate pushdown before the data goes over to the SQL database). Having said all that, the functionality is available so theoretically you can already start going a teensy bit wild with SQL over CDM. =) (For the sake of completeness I feel the need to say…PolyBase is strongly encouraged as the way to *load* data to Azure SQL DW, but use PolyBase with caution for remote queries for the reasons I mentioned above. There’s a lot more to it than that, but that’s the gist.)
  
  Loading...
  
  Reply
  1. suhail ali says:
    
    March 15, 2019 at 4:12 pm
    
    I forgot about Polybase. That does allow SQL over CDM. I had a project using PowerApps and I was shuttling data between the CDM and SQL Server via Microsoft Flow because I had to do some complex calculations that could easily be done with the full breadth of T-SQL. Yes, there’s also a huge comfort level and proficiency I’ve built up with SQL where as using Power Query, Flow and the PowerApps business rules engine could have solved but would’ve taken longer to puzzle it out. Doing reporting off of CDM via Polybase, Power Query or staging the data using Azure Data Factory I can handle and fully agree staging the data in SQL is the way to go for data warehousing. It’s the complex read/write business logic where I have trouble but then there’s such a big ecosystem around CDM now that I want to be leveraging it for reporting and application development.
    
    Loading...
Pingback: Defining Relationships Between Entities In The Common Data Model To Automatically Create Relationships In Power BI « Chris Webb's BI Blog
Pingback: Attaching Manually-Created Common Data Model Folders In Power BI Dataflows | Pardaan.com
Peer Grønnerup says:

May 4, 2019 at 11:19 am

Hi Chris,

Nice posts 🙂

Maybe you have some knowledge of an issue which I am facing with Data Flows using a on-prem source on a shared capacity.

I have stumpled upon a (to me) strange error. I have a PowerBI Dataflow storing data on an underlying ADL Gen2 Storage. The Workspace is not on a Premium Capacity and the source data resides on a on-premise Oracle. So I have setup an on-prem gateway in order to fetch the data.

Running a refresh (manual or scheduled) produces the following error:

Refresh can’t run: This dataflow uses Azure Data Lake Storage Gen2 and an associated gateway with shared capacity. You can enable refresh by removing the gateway or by upgrading this workspace to Power BI Premium capacity.

Of course removing the gateway makes no sense as I need this for the on-prem data. And the documentation says nothing about premium capacity required for refresh from on-prem sources.

As of now I have had to create the data flow as an external data flow. Just like you have demonstrated in this post. Instead of loading files via PBI Dataflow I use ADF and manually maintain the model.json which is pretty irritating as the Oracle source is an old Ax a lot of columns in each entity.

Any insights from you on the above?

Loading...

Reply
1. Chris Webb says:
  
  May 10, 2019 at 10:11 am
  
  That message is strange – I haven’t seen it before, and my understanding is that you are right and Premium should not be necessary for what you’re trying to do (though I may be wrong). Maybe open a support call or go to http://issues.powerbi.com/ ?
  
  Loading...
  
  Reply
  1. Rick says:
    
    November 29, 2019 at 5:19 pm
    
    The thing is:
    On-premises data sources, in Power BI Shared capacities, are not supported in dataflows stored in your organization’s Azure Data Lake Storage Gen2.
    https://docs.microsoft.com/en-us/power-bi/service-dataflows-configure-workspace-storage-settings
    
    Loading...
Miles says:

June 4, 2019 at 8:26 pm

Any idea whether or not the ADLS gen2 storage account has to be in the same tenant as Power BI?

I’m testing via data from a different tenant but am getting a 500 error when trying to create it. I’ve set up the read/execute right via ASE for the Power BI user, the Json URL is in the right format and the file is simple like yours…

Loading...

Reply
1. Chris Webb says:
  
  June 4, 2019 at 8:34 pm
  
  I don’t know – do you mean in the same Azure Region?
  
  Loading...
  
  Reply
  1. Miles says:
    
    June 4, 2019 at 9:00 pm
    
    Both. I’ve tried creating the storage account in the same Azure Region but it still fails. I’ll have to try getting a storage account created in the same Azure Directory/Tenant to see if that works. So far no luck across different tenants.
    
    Loading...
  2. Miles says:
    
    June 4, 2019 at 9:15 pm
    
    Are you using dfs or blob as your endpoint? Should be dfs since its ADLS gen 2 but it seems to vary in the documentation…and ive noticed the self-hosted PBI storage uses the blog endpoint.
    
    Loading...
  3. Miles says:
    
    June 5, 2019 at 10:17 pm
    
    Chris – I got ADLS deployed in the same tenant/directory as PBI and it works!
    
    Also I tested creating a CDM folder with a reference entity from another CDM folder and it worked! This seems to be how MSFT is deploying “Linked Entities”…but this works with only Pro! The key is in pointing to the other CDM json file and then defining your own modelID which gets referenced in the entity section.
    
    Amazing potential here…
    
    Loading...
Ronieri Marques says:

June 11, 2019 at 7:19 pm

Hi, I’ve tried the same steps but keep receiving the message below when adding the CDM folder to dataflow:

CDM folder could not be added.
You cannot add the CDM folder, because you do not have permission to it. Please contact the CDM folder creator to grant you access and try again.

Already gave access to my user and to power bi groups on the folder, files under it and the file system

Any help would really be appreciated

Thanks

Loading...

Reply
1. Ronieri Marques says:
  
  June 11, 2019 at 10:48 pm
  
  …in time, with one difference, I’m not using “powerbi” file system, I’m using another file system I’ve already had on the lake
  
  Regards
  
  Loading...
  
  Reply
Ronieri Marques says:

June 12, 2019 at 1:38 pm

Made it!

But only using “powerbi” file system, anyone made it with file systems already existing, just by changing access rights?

Another problem was that I had to use csv. Semicolon separated columns didn’t work either.

And still doesn’t work with the parquet 🙁

Waiting for the day it connects to Gen 2 as easy as to Gen 1 nowadays

Sheers

Loading...

Reply
1. Chris Webb says:
  
  June 13, 2019 at 4:22 pm
  
  I don’t believe other file systems are supported right now.
  
  Loading...
  
  Reply
Ronieri Marques says:

June 14, 2019 at 1:52 am

Actually it worked, all i had to do was give some access rights, the same that are given when you use Power BI to create a CDM

It’s a good shortcut to make some discovery levereaging data you already have on the lake, also delivering good viz already

Could automate model.json creation on ingestion, and data would be ready to use right away

Regards

Loading...

Reply
Rafael says:

June 24, 2019 at 6:12 am

Hi Chris,

Hope you are well.

In relation to this topic, are you able to use this model.json to map the data?

Let’s say you have sales information from other system and you develop a data flow to read this data but you want to map it to the same model of these files that you already have in your CDM Folder.

Are you able to map it?

Regards,

Rafael

Loading...

Reply
1. Chris Webb says:
  
  July 4, 2019 at 9:56 am
  
  I don’t understand the question properly, but I think the answer here is no
  
  Loading...
  
  Reply
Pingback: Sharing Data Between Organisations With Azure Data Share « Chris Webb's BI Blog
Phil says:

September 8, 2019 at 1:43 am

Are people conforming around a way to automate the creation and maintenance of model.json files? It doesn’t look like there are native ways to do this when working with Azure services.

Loading...

Reply
1. Chris Webb says:
  
  September 8, 2019 at 10:30 am
  
  The native ways to do this are starting to appear slowly. The other week I helped test a scenario where a CDM folder created by Dynamics was attached as a dataflow in Power BI.
  
  Loading...
  
  Reply
  1. Simon Nuss says:
    
    September 8, 2019 at 12:34 pm
    
    A CDM folder created by Dynamics? Is this using the new ADLgen2 replication preview feature?
    
    Loading...
  2. Chris Webb says:
    
    September 8, 2019 at 4:11 pm
    
    Yes, I think that’s what the feature’s called
    
    Loading...
2. Ronieri Marques says:
  
  September 8, 2019 at 3:37 pm
  
  Last month, Microsoft released a Power BI Desktop preview version with Azure Data Lake Storage Gen2 connector, seens like in this case, won’t be necessary to update model.json anymore, but I didn’t tested well
  
  With a folder i use with CDM, i could connect directly using the new connector, with another folder, got an exception, looks like some rights are missing, even thougt I set the same rights as on the good folder
  
  Link about July release:
  
  https://powerbi.microsoft.com/en-us/blog/power-bi-desktop-july-2019-feature-summary/#azureDataLake
  
  Loading...
  
  Reply
Pingback: Power BI Dataflows – My Blog
Jay Vangapandu says:

June 1, 2020 at 3:21 am

is there any way to map metadata from model.json file and tables and load data along with the column names into a Azure sql database

Loading...

Reply
Marco Pozzan says:

December 21, 2020 at 10:28 am

hi chris. Beautiful article. I tried to create a model.json file and import it. What happens is that the connector from power bi reads the dataflow but I don’t see the data. I tried with a simple calendar table with data in a csv but not work correctly. any ideas? thank you

Loading...

Reply
Andrew P says:

March 18, 2021 at 8:32 pm

Hi Chris, I was able to create my model.json file with no issues and can see all of the metadata being correctly displayed in my report that I generated off of my dataflow but there is no actual data in those tables. Do you know what might be missing here? My data lake has been set up properly as far as I know with the correct entity structure etc. Any help would be appreciated!

Loading...

Reply