Keep your existing Power BI data and add new data to it using Fabric

One of the most popular posts on my blog in the last few years has been this one:

Keep The Existing Data In Your Power BI Dataset And Add New Data To It Using Incremental Refresh

To be honest I’m slightly ashamed of this fact because, as I say in the post, the solution I describe is a bit of a hack – but at the same time, the post is popular because a lot of people have the problem of needing to add new data to the data that’s already there in their Power BI dataset and there’s no obvious way of doing that. As I also say in that post, the best solution is to stage the data in a relational database or some other store outside Power BI so you have a copy of all the data if you ever need to do a full refresh of your Power BI dataset.

Why revisit this subject? Well, with Fabric it’s now much easier for you as a Power BI developer to build that place to store a full copy of your data outside your Power BI dataset and solve this problem properly. For a start, you now have a choice of where to store your data: either in a Lakehouse or a Warehouse, depending on whether you feel comfortable with using Spark and notebooks or relational databases and SQL to manage your data. What’s more, with Dataflows gen2, when you load data to a destination you now have the option to append new data to existing data as well as to replace it:

If you need more complex logic to make sure you only load new records and not ones that you’ve loaded before, there’s a published pattern for that.

“But I’m a Power BI developer, not a Fabric developer!” I hear you cry. Perhaps the most important point to make about Fabric is that Power BI is Fabric. If you have Power BI today, you will have Fabric soon if you don’t have the preview already – they are the same thing. One way of thinking about Fabric is that it’s just Power BI with a lot more stuff in it: databases, notebooks, Spark and pipelines as well as reports, datasets and dataflows. There are new skills to learn but solving this problem with the full range of Fabric workloads is a lot less complex than the pure Power BI approach I originally described.

“But won’t this be expensive? Won’t it need a capacity?” you say. It’s true that to do all this you will need to buy a Fabric capacity. But Fabric capacities start at a much cheaper price than Power BI Premium capacities: an F2 capacity costs $0.36USD per hour or $262.80USD per month and OneLake storage costs $0.023 per GB per month (for more details see this blog post and the docs), so Fabric capacities are a lot more affordable than Power BI Premium capacities.

So, with Fabric, there’s no need for complex and hacky workarounds to solve this problem. Just spin up a Fabric capacity, create a Warehouse or Lakehouse to store your data, use Dataflows Gen2 to append new data to any existing data, then build your Power BI dataset on that.

4 thoughts on “Keep Your Existing Power BI Data And Add New Data To It Using Fabric”

Most of this seems to be solving the wrong problem. Real world business data is seldom immutable (order information, for example). What we need is differential refresh and the ability to modify dataset partitions, via CDC or other clever means.

“so you have a copy of all the data if you ever need to do a full refresh of your Power BI dataset” – cannot emphasize enough how important this is.

Chris Webb says:

September 24, 2023 at 9:57 pm

I think any BI professional who has solved this problem knows that business data is rarely immutable, but the Power BI developers who find my original post and are building simpler self-service solutions just want to add new data to the data they already have. I hope Fabric will make it much easier to handle these more advanced scenarios (for example upserts) in the future.

Loading...

Reply

Pingback: Maintaining Existing Power BI Data while Loading More with Fabric – Curated SQL

Pingback: Chris Webb's BI Blog: Keep The Existing Data In Your Power BI Dataset And Add New Data To It Using Incremental Refresh

Lutz says:

September 24, 2023 at 7:36 pm

Most of this seems to be solving the wrong problem. Real world business data is seldom immutable (order information, for example). What we need is differential refresh and the ability to modify dataset partitions, via CDC or other clever means.

“so you have a copy of all the data if you ever need to do a full refresh of your Power BI dataset” – cannot emphasize enough how important this is.

Loading...

1. Chris Webb says:
  
  September 24, 2023 at 9:57 pm
  
  I think any BI professional who has solved this problem knows that business data is rarely immutable, but the Power BI developers who find my original post and are building simpler self-service solutions just want to add new data to the data they already have. I hope Fabric will make it much easier to handle these more advanced scenarios (for example upserts) in the future.
  
  Loading...
  
Pingback: Maintaining Existing Power BI Data while Loading More with Fabric – Curated SQL
Pingback: Chris Webb's BI Blog: Keep The Existing Data In Your Power BI Dataset And Add New Data To It Using Incremental Refresh

Keep Your Existing Power BI Data And Add New Data To It Using Fabric

Like this:

Published by Chris Webb

4 thoughts on “Keep Your Existing Power BI Data And Add New Data To It Using Fabric”

Leave a ReplyCancel reply

Share this:

Like this:

Published by Chris Webb

4 thoughts on “Keep Your Existing Power BI Data And Add New Data To It Using Fabric”

Leave a ReplyCancel reply

Discover more from Chris Webb's BI Blog