Keep The Existing Data In Your Power BI Dataset And Add New Data To It Using Incremental Refresh

2020-04-13T17:46:13+01:00

“or even as text files in a folder (Power Automate may be useful to do this) and use that staged copy as the data source for Power BI”

or even better:
as Excel files as the data source for Excel!
US government users cannot get a Power BI account. Excel works fine for small to medium size datamarts/reportmarts!

Reply

2020-04-15T20:21:53+01:00

1) US Government uses different URL, FYI
https://app.powerbigov.us/

2) You may need permissions to access from your site systems co-ordinator.

– Sreedhar Vankayala

Reply

2020-04-13T20:04:27+01:00

Cool post, Chris! 🙂
When it becomes available, would you mind doing a blog post on how to reprocess / recalc your partitions?

Reply

2020-04-14T08:44:20+01:00

Thanks Chris, but how to publish new version of report without deleting old data partitions

Reply

2020-05-02T02:49:43+01:00

You can use ALM toolkit, https://www.youtube.com/watch?v=yKvMrQlUrCU

Reply

2020-05-15T23:41:10+01:00

I don’t see the video talking about updating a report, like creating a table. From what I understood ALM toolkit allows you to do metadata updates, but not report level changes like visualizations.

Any idea on how I can update a report. Thanks!

Pingback: Incremental Refresh with Power BI – Curated SQL

2020-04-15T11:32:54+01:00

Hey Chris – did you mean to post the M code in the post? Looks like it’s missing 🙂

Reply

2020-04-15T11:35:26+01:00

I can see the code ok…? What device are you using to view the site?

Reply

2020-04-19T13:14:04+01:00

Hi Jay, you possibly can’ see the code because its being hosted from Github, so if your company blocks Github (mine does) then you won’t see it…

Reply

Pingback: Dataflows as an Alternative to Incremental Loading in Power BI – Curated SQL

2020-04-17T17:16:10+01:00

How can Incremental Refresh work on a SharePoint “List”. For instance, when a new item (record, row) is added to a SharePoint List, there is an associated Modified Date. Can an incremental refresh be performed based on a SharePoint List “Item” (row) “Modified Date” ?

Reply

2020-04-20T16:13:38+01:00

You would need to use something like the technique Miguel Escobar uses in the blog post I linked to above. Why do you need to do this? Is it really slow to load all the data from your SharePoint list?

Reply

2021-02-08T20:47:57+00:00

Hi I am having this same issue. Yes, it takes approximately 30 minutes to refresh the dataset and I’d like to know if there’s a way to incrementally refresh on modified date? Thank you!

2021-02-09T18:48:18+00:00

I have found file operations (read, write) in Sharepoint to be much slower than my alternatives (intranet server or Box). Evidently Sharepoint is not designed to be fast for data storage and retrieval??

2021-02-09T18:50:47+00:00

Correct- in fact SharePoint can sometimes throttle requests for data. It’s convenient but there are better options for storing files used by Power BI.

2021-02-09T18:54:47+00:00

So how is it ok that MS 365 seems to recommend using Sharepoint? Why use something new – Sharepoint – when what we were using before is faster and better (Box also saves last 100 versions of each file!). Anyone recommending Sharepoint is doing many of us disservice at best, right?

2021-02-09T20:57:46+00:00

It depends on what you want to use SharePoint for. Did someone specifically recommend it as a place to store data for Power BI?

2020-04-20T07:27:59+01:00

This is exactly the kind of pattern i was after for an emailed once a day spreadsheet that is unique each day but wanted to append each day.

However I thought this could equally be applied to dataflows but I got some really weird behaviour in dataflows. I tried different combinations : creating the parameters first before turning on incremental refresh (query wouldn’t save), turning incr refresh on first and a recurring list of parameters would be auto created down the editor.

Reverting to trying with dataset….

Reply

2020-04-20T16:09:29+01:00

Interesting, I haven’t tried it with dataflows. Since datasets and dataflows are very different under the covers I can believe that a technique that works for one doesn’t work for the other, but I’ll try to investigate when I have a moment.

Reply

2020-05-28T16:06:56+01:00

Hi Chris, Thanks for amazing blog!
This is exactly what I am trying to achieve but with dataflows. I have three different tables to pull the data from. I followed all your steps but couldn’t get the expected result. So I was wondering if you did get a chance to investigate how the Incremental refresh works with dataflows.
Thanks!

Pingback: Keep The Existing Data In Your Power BI Dataset And Add New Data To It Using Incremental Refresh | Pardaan.com

2020-05-27T10:28:10+01:00

I have tried above solution but still it’s not adding new data. It is showing new only after refres can anyone help me please ?

Reply

2020-06-05T13:53:32+01:00

Hey buddy.
How do we dfine the RangeStart and RangeEnd. you are saying yesterdays date but surely it is not a constant data of when we crated this solution.
Dont we need to put a dynamic value to make the previous days date wheneverthe refresh runs?

Reply

2020-06-07T22:33:48+01:00

They are just parameters – the documentation on incremental refresh explains what they do.

Reply

2020-06-11T13:19:09+01:00

Great content!

I tried using this approach obtaining data from a web source with no avial 🙁

I am calling a web source (API) and need to have data for the last 1 years. Obviously incremental refresh will be a great option so that I don’t have to pull 12 months of data on every refresh.

I created the RangeStart and RangeEnd parameters, included them in the web url from and to criteria. I had to create a function to deal with pagination as well so I loop the url for each page of data. I enabled the incremental refresh to store data for the 12 months and refresh for the last 3 days. All looked well in the Desktop version after saving and applying (latest May 2020 release). I published it to the service but when I hit refresh it failed with missing column errors. I suspected that for older dates some columns might not have data so I opened the original desktop file again to troubleshoot. Before I changed the RangeStart date to a recent date just so that I don’t pull thousand of record to the desktop version but I couldn’t apply changes, I receive an ODBC error. The only way to fix it was to change the RangeStart and RangeEnd parameter type to Date from Date/Time which broke the incremental refresh functionality and automatically disabled it which enabled me to apply the changes again. I am not sure if this is an issue with the latest May release or if there are underlying issues using this option for a web source. So I have two issues, one the incremental refresh implemented with a web source URL is not working in the service and I am unable to make changes tot he desktop file once incremental updates are turned on.

Reply

2020-06-14T21:49:20+01:00

It’s hard to say what’s going on here, but I suspect you’ll have to find another way to fix the problem without changing the type of the RangeStart and RangeEnd parameters.

Reply

2020-07-08T01:29:47+01:00

Hi Chris,
I followed your steps, 1st day ran perfecr, but 2nd day is getting empty table,can yo help me? maybe I can send you screenshot or my code??

Reply

2020-07-09T21:09:09+01:00

about parameters:
Current Value of RangeStart was set to 07/08/20 12:00:00 AM
Current Value of RangeEnd was set to 12/31/20 12:00:00 AM

There is no relationship between tables RangeStart – RangeEnd – backlog

First date was ok, but at the 2nd day table was empty.

see below M code:
————
let
Source = Excel.Workbook(File.Contents(“C:\Amdocs\Dashboard\backlog.xlsx”), null, true),
backlog_Sheet = Source{[Item=”backlog”,Kind=”Sheet”]}[Data],
#”Changed Type” = Table.TransformColumnTypes(backlog_Sheet,{{“Column1″, type text}}),
#”Promoted Headers” = Table.PromoteHeaders(#”Changed Type”, [PromoteAllScalars=true]),
#”Changed Type1″ = Table.TransformColumnTypes(#”Promoted Headers”,{{“Inc Call ID”, type text}}),
//Find the current date and time when this query runs
CurrentDateTime = DateTimeZone.FixedUtcNow(),
//Find yesterday’s date
PreviousDay = Date.AddDays(DateTime.Date(CurrentDateTime),-1),
//Put the current date and time in a new column in the table
#”Added Custom” = Table.AddColumn(#”Changed Type1″, “UTC Data Load Date”, each CurrentDateTime),
#”Changed Type2″ = Table.TransformColumnTypes(#”Added Custom”,{{“UTC Data Load Date”, type datetimezone}}),
//Add the filter required for incremental refresh
//Only return rows in this table if:
//a) The RangeStart parameter equals yesterday’s date, and
//b) RangeEnd is not null (which should never be true)
#”Filtered Rows” = Table.SelectRows(#”Changed Type2″, each DateTime.Date(RangeStart)=PreviousDay and RangeEndnull)
in
#”Filtered Rows”

————–

Reply

2020-07-10T15:09:18+01:00

Hi Luis and Chris,

Same issue on my end! Refreshing the next day just blanks the table. I’ll go double check my steps, but hopefully a Power BI update didn’t break this.

Reply

2020-07-30T14:31:37+01:00

Hi
I have the same problem.
First refresh great – 76 lines
Next day – nothing – blanc

Reply

2021-08-16T12:51:52+01:00

Same for me as well

2020-07-13T19:08:57+01:00

Hi Connor, please let me know if you could test again?

@Chris, pls your help.

Reply

2020-08-18T16:21:50+01:00

Hi Chris,

I’m having a problem when trying to use an incremental refresh.

I’m on Premium, I have a file size of 3GB.

When I’m testing it seems really inconsistent, I’ve set up me RangeStart and End. On a query folding query. adding the incremental refresh policy to the table.

Now if I save this file and reopen it, it is almost a lottery as to whether it will recognise the incremental refresh policy is still there. Sometimes it will not find any parameters? (There are parameters as that is how I setup the incremental refresh in the first place). Sometimes it can’t figure out it’s query folding?

I have most recently refreshed it all, made sure it kept my incremental settings and published it. I have tried to refresh it 4 times since. Once it refreshed it quickly and seemed to work OK (50 mins to refresh in total). Trying to refresh the same file again now and it taking over 3 hours to refresh. When I look at the SQL sessions that are running it looks like it finishes refreshing the whole model, 5 minutes will pass, then it will restart running all of the SQL again, trying to refresh all tables again. In total I think it tries to run all the SQL code 3 times. before it ends up failing.

Does the size of the pbix file have any impact on Incremental Refresh?

I’ve looked at the XMLA partitions and can see they are in place. So really struggling to figure out a way around this. Because of the size of the file. It takes a long time to try and diagonose any issues. So was hoping you may have some insight as to whether you have seen or heard of this issue before?

Reply

2020-09-09T10:28:24+01:00

Hi Chris, that was great!

I’m trying to build a daily tracker for project task progress over time. The issue I have with your method is being able to see only the updated task and missing the full picture of the entire project. As example lets say I have 100 tasks today status (60 hold, 20 started, 20 completed) to track the progress day by day as following:

– day 2: (55 hold, 22 started, 23 completed)
– day 3: (50 hold, 25 started, 25 completed)
– day 3: (50 hold, 20 started, 30 completed)
etc..
– day x: (0 hold, 0 started, 100 completed)

Is there any simple way to do it since my application is very small and not feasible for a data whorehouse?

Reply

2020-09-22T11:16:28+01:00

Cracking piece Chris, Simple read through of the RangeStart and RangeEnd piece, copy and paste of the code from your GitHUB window and I have a self refreshing daily update coming from SalesForce giving me daily data going back 2 months so i can track trends, without having to invest in any costly add-ons to SalesForce.
You have made a major contribution to Safety within my industry. Thank you.

Reply

2020-10-13T14:43:35+01:00

Hi Chris,

Excellent post, thanks! I’ve successfully used it to scrape mortgage interest rates from a website on a daily basis (by converting the underlying html to xml, I was able to set up autorefresh for a webpage without the need for a Gateway).

However, now I want to build upon this data to compare the data from yesterday with today to assess if there are any changes in the interest rate.

Is there a proper way to do this without breaking the incremental refresh functionality?

Thanks,

Bas

Reply

2020-10-13T14:56:49+01:00

This kind of comparison is something you should be doing in DAX after the data has been loaded, using a measure in your dataset

Reply

2020-10-23T08:13:14+01:00

Hi Chris,

As I ran in some refresh issues with the version I made, I downloaded your BBCNews PBIX and published it into the service with a daily scheduled refresh (without specifying an update time).

It looked promising as the PBIX file in the desktop was empty, but after publishing it; it did contain data from that day (21-10-2020 11.37.57, this is the same time as the publication time). Today I would’ve expected similar lines, but then also for the 22-10 and possibly for the 23-10. However, it still shows only the data from 21-10. When doing a manual refresh in the service, it still only shows the data from 21-10. When opening the file in the desktop, I do get the current data when hitting refresh (without applying the last filter step in the query).

Do you know what’s going wrong? I don’t understand why an exact copy of your PBIX doesnt incremental refresh when publishing it straight into the service.

Thanks in advance.

Rick

Reply

2020-11-23T15:49:35+00:00

Hi Chris,

I have followed your blog and concept is working fine in Power BI Dataset. But i have requirement to implement the same in Power BI Dataflows and i tried but it did not work.

Could you also let us know how we can implement the same concept in Power BI Dataflows ?

Thanks in Advance,
Shiva

Reply

2020-12-03T03:26:37+00:00

Hi Chris,

Brilliant write up. This is exactly what I tried to do. As Rick – I downloaded your PBIX because I could not get the instructions to work on my own dataset. I tried publishing as is and got the same result as Rick. I tried various other things but cannot get it to store past data and add to the dataset on refresh.

Could it be a tenant setting that prevents you or anything like that?

Reply

Pingback: Safe money, time and performance! Or use Power Automate, Sentiment Analysis and Power BI with incremental refresh. | flip-it.de :: SQL, BI and more

2021-02-16T18:25:50+00:00

Chris, this is nice article, but I’m curious…what about adding data to a dataset for something that is “snapshot like” that has nothing to do with date and time? i.e. what if you had a parameter table somewhere that you want to populate that triggers a dataflow based on those parameters and then just appends that data for the given Parm ID to the dataset. Dataflows themselves are technically always snapshot, but there doesn’t seem to be a good way to keep data from all of your snapshots so that you can go back and look at them easily over time. Any thoughts?

Reply

2021-03-22T00:24:52+00:00

Hi Chris, I downloaded your pbix file as well performed on my own dataset but the table seems to be blank and no data in it all ?

What could be done ?

Reply

2021-03-26T13:47:36+00:00

Great post many thanks. I implemented the above with no issues. I have noticed that if manually refreshing from Power BI Desktop – it wipes out the history of all incremental refresh. My question – is it possible to keep the historic incremental refresh with all partitions and re-upload in case you need to apply changes to the pbix original file? many thank!

Reply

2021-04-30T10:26:42+01:00

Hello Chris this was a really good post. I have one question would it work to refresh every 5 minutes the last 24 hours? I have not been able to do it.

Reply

2021-06-01T15:05:37+01:00

We are now in the midst of going Live with multiple reports for multiple team and we are kind of thinking what’s the best way to implement the logic of incremental refresh within Dataflows when combining historical data ( via sharepoint file) and daily load data via API calls .

Here are some of the key configurations from our side :
a. We are currently on power bi premium subscription
b. We would need to use dataflows (as all transformation logic is stored in dataflows across multiple entities to setup incremental refresh , as this dataflow would be used by multiple reports within individual workspaces catered to end users
About data –
a. Historical data : historical (one off activity) data is delivered by our DWH vendor through sftp which we are storing into SharePoint online and pointing our Power BI to this location
b. Our daily data is to be loaded using API calls from our DWH vendor location
c. Key thing to also note is that Daily data is only available on weekdays only (Mon – Fri) So basically run the refresh on Monday to get last friday’s close data
d. Created a dataflow with two entities ( historical and Daily load)

With all the above information – here are some of the things i have tried the below two options and noticed few issues in these . Reaching out to you to hear any better options to set this up .

OPTION 1 – Set Daily refesh for last 4 or 5 days and append to historical table , so last 5 days data is kept updated incase of any backdataed transactions . Append the daily load to Historical load . 1st run works fine , then from Day 2 load – Day 1 data goes missing and Day 4 gets added .

OPTION 2 – Refresh daily load and append to historical table – This option works well but in this case , i would need to pass a date parameter to pick data from DWH for the last Friday’s close and again , if i need to refresh twice a day (AM/ PM ) then this might not work and cause duplicate data when appending to history table
Also to be is we are using Anonymous connection to connect to Web API

Any guidance on this would be appreciated

Reply

2021-09-08T13:18:47+01:00

Hi,

I have few queries on the incremental load to PowerBI.

Issue 1
So basically we choose 2 date columns, one is the business date on which the data is partitioned, another is incremental date columns which gets updated when the actual record is updated.
So If i set the incremental option as ‘Refresh data in the last’ 1 month, and ‘Store data in the last’ 24 months – basically 24 months data based on business date is stored. Last 1 month partition is only checked for data updates.

Is there someway to handle such things that i keep track of changes for older data also?

Issue 2
If my data is changed whose business date is 4 months back they are not refreshed.
Secondly, even if one record in my last 1 month changes, it reload the entire month partition (which can have 100s or 1000s of records).

Is there a solution to avoid this ?

Issue 3
Another scenario is, assume i store history in the table with start date and end date but report needs only the latest active record. In that case, the incremental date is always new as new history will be created on any change. So by this concept of Incremental, it may never work correctly.

Is there a solution for this to be handled?

Reply

2021-12-15T11:59:15+00:00

Many tks, Chris! the tutorial about “Incremental Refresh” was very helpful in solving my issue. Awesome \m/

Reply

2022-08-28T03:13:53+01:00

The options for incremental refresh have changed.
1. Archive data starting # days before refresh date
2. incrementally refresh data starting # days before refresh date

Can you advise what settings shouldbe applied, thanks.

Reply

2023-02-27T20:03:46+00:00

Hi Chris, many thanks for this, can I ask what would be the method for passing values to the RangeStart and RangeEnd variables on a day to day basis?

Reply

Pingback: Keep your existing Power BI data and add new data to it using Fabric

2023-10-04T14:01:07+01:00

Hi Chris – it appears that the September update at the beginning of your post has a broken link.

Reply

2023-10-04T14:07:55+01:00

Thanks, I just fixed it

Reply

2024-03-25T15:41:16+00:00

Have the Miguel Escobar posts migrated to another site?

Reply

2024-03-25T16:18:01+00:00

His domain expired. I just let him know 🙂

Reply

2024-04-26T20:36:59+01:00

Can you explain a little more about the Power Automate option

Reply

2025-04-30T00:47:21+01:00

has anyone succeeded using this approach ? if yes can you paste you M code (just the parameters part).
I have done this using dataflows felt much easier. But always fails using this approach ( i guess i am missing something).

appreciate your help in this.

Reply

2025-06-07T08:34:32+01:00

Hey Chris,

First of all, thanks for the solution. That worked like a magic and solved the 1st part of my problem statement. This post from 2020 still addresses lot of concerns 🙂

Now I have another problem which needs to be solved. So taking the same example of the data and let’s say we have it as following :
Column 1 Column 2
Row 1 – UTC Data Load Date 09-04-2020 10-04-2020
Row 2 – Story Count 54 53

I want to add another column ( named “Variance”) to capture the difference between the Story Count (Row 2 values). So basically the output should look something like below :

Column 1 Column 2 Column 3
Row 1 – UTC Data Load Date 09-04-2020 10-04-2020 Variance
Row 2 – Story Count 54 53 1

Now, this would have been easy if we had the names of the columns for the specific table. However, in this case since we have the incremental refresh on, the column names will change everyday. For example – The next day the coloum1 name could be 10-04-2020 and column 2 would be 11-04-2020.

So for such cases, where we have incremental refresh on and the column names will change dynamically (in this case these will be the dates which will change everyday) , how do we calculate the difference between two column values across each rows?

Can you or anyone reading this post please HELP?

Regards,
Shubhojit

Reply

	let
	//Connect to the BBC News Top Stories RSS feed
	//and create a nicely formatted table
	Source = Xml.Tables(Web.Contents("http://feeds.bbci.co.uk/news/rss.xml")),
	#"Changed Type" = Table.TransformColumnTypes(Source,{{"Attribute:version", Int64.Type}}),
	channel = #"Changed Type"{0}[channel],
	#"Changed Type1" = Table.TransformColumnTypes(channel,{{"title", type text}, {"description", type text}, {"link", type text}, {"generator", type text}, {"lastBuildDate", type datetime}, {"copyright", type text}, {"language", type text}, {"ttl", Int64.Type}}),
	item = #"Changed Type1"{0}[item],
	#"Changed Type2" = Table.TransformColumnTypes(item,{{"title", type text}, {"description", type text}, {"link", type text}, {"pubDate", type datetime}}),
	#"Removed Columns" = Table.RemoveColumns(#"Changed Type2",{"guid"}),
	#"Renamed Columns" = Table.RenameColumns(#"Removed Columns",{{"title", "Title"}, {"description", "Description"}, {"link", "Link"}, {"pubDate", "Publication Date"}}),
	//Find the current date and time when this query runs
	CurrentDateTime = DateTimeZone.FixedUtcNow(),
	//Find yesterday's date
	PreviousDay = Date.AddDays(DateTime.Date(CurrentDateTime),-1),
	//Put the current date and time in a new column in the table
	#"Added Custom" = Table.AddColumn(#"Renamed Columns", "UTC Data Load Date", each CurrentDateTime),
	#"Changed Type3" = Table.TransformColumnTypes(#"Added Custom",{{"UTC Data Load Date", type datetimezone}}),
	//Add the filter required for incremental refresh
	//Only return rows in this table if:
	//a) The RangeStart parameter equals yesterday's date, and
	//b) RangeEnd is not null (which should never be true)
	#"Filtered Rows" = Table.SelectRows(#"Changed Type3", each DateTime.Date(RangeStart)=PreviousDay and RangeEnd<>null)
	in
	#"Filtered Rows"

Keep The Existing Data In Your Power BI Dataset And Add New Data To It Using Incremental Refresh

Like this:

Published by Chris Webb

59 thoughts on “Keep The Existing Data In Your Power BI Dataset And Add New Data To It Using Incremental Refresh”

Leave a Reply to StevenCancel reply

Share this:

Like this:

Published by Chris Webb

59 thoughts on “Keep The Existing Data In Your Power BI Dataset And Add New Data To It Using Incremental Refresh”

Leave a Reply to StevenCancel reply

Discover more from Chris Webb's BI Blog