Accumulating Data In An Excel Table Using Data Explorer (Power Query) and PowerPivot

NOTE: This post was written before Data Explorer was renamed as Power Query. All of the content is still relevant to Power Query.

One of the first questions I get asked after showing someone PowerPivot for the first time is “Can I add new data to a PowerPivot table that already has data in it?”. Out of the box, of course, the answer is no: when you process a table in PowerPivot you have to reload all the data from your data source, you can’t just append new data (unless you’re using copy/paste to load data, which isn’t a good idea). However, there are a lot of self-service BI scenarios where the ability to do this would be extremely useful: for example, you might want to scrape stock quotes from a web page every day and then, in an Excel workbook, accumulate that data in a table so you can analyse historical stock prices with PowerPivot. I ran into a scenario very much like this last week and I thought that Data Explorer should be able to help here. It can, but it’s not obvious how to do it – hence this blog post!

Here’s a super-simple example of how to accumulate data in a table then. Let’s start with a csv file that contains the following data:

Product,Sales
Apples,1
Oranges,2

It’s straightforward to import this data into Excel using Data Explorer and the ‘From csv’ data source:

 

Here’s the code that Data Explorer generates:

let

    Source = Csv.Document(File.Contents("C:\InputData.csv")),

    FirstRowAsHeader = Table.PromoteHeaders(Source),

    ChangedType = Table.TransformColumnTypes(FirstRowAsHeader,

                              {{"Product", type text}, {"Sales", type number}})

in

    ChangedType

 

Now, let’s imagine that you want to keep the data from this file in Excel and every time you click Refresh in Data Explorer you add the data from the file onto the end of the existing data you’ve already captured. The first thing you’ll probably want to do in this scenario is add a new column to the data that gives the date and time that the data was loaded, and you can do that quite easily in Data Explorer using the DateTimeZone.UtcNow() function as follows:

Table.AddColumn(ChangedType, “Load Date”, each DateTimeZone.UtcNow())

Data Explorer has functionality to append the data from one query onto the end of another query, but the problem you have to solve now is that when you click Refresh you want the new data to be appended onto the end of the data that has already been collected. It’s a recursive scenario not unlike the one I grappled with here. The solution to this problem is to first of all load the data into the PowerPivot (ie what we should be calling the Excel Data Model now) by clicking on the Load To Data Model link in the Data Explorer query pane:

image

Then, on a new sheet, create an Excel query table that returns all the data from the PowerPivot table that you’ve just loaded data into. Kasper shows how to do this here; there’s no need for any special DAX, you just need to connect to the PowerPivot table in the Existing Connections dialog:

At this point you should have two tables on two sheets that contain the same data. The next step is to modify the original Data Explorer query so that it contains a new step that appends data from the table you’ve just created (ie the table getting the data from PowerPivot) onto the data from the csv file. This can be done with three new steps, first to get the data from the new Excel table:

Excel.CurrentWorkbook(){[Name=”ExistingData”]}[Content]

Then to make sure the Load Date is treated as a DateTimeZone type:

Table.TransformColumnTypes(GetExistingData,{{“Load Date”, type datetimezone}})

Then finally to combine the two tables:

Table.Combine({ChangedType1,InsertedCustom})

Now, whenever you Refresh your Data Explorer query, you will see the data from the csv file appended to the data that has already been loaded:

Here’s the complete code:

let

    Source = Csv.Document(File.Contents("C:\InputData.csv")),

    FirstRowAsHeader = Table.PromoteHeaders(Source),

    ChangedType = Table.TransformColumnTypes(FirstRowAsHeader,

                  {{"Product", type text}, {"Sales", type number}}),

    InsertedCustom = Table.AddColumn(ChangedType, "Load Date", each DateTimeZone.UtcNow()),

    Custom1 = Excel.CurrentWorkbook(){[Name="Table_Input_Data"]}[Content],

    ChangedType1 = Table.TransformColumnTypes(Custom1,{{"Load Date", type datetimezone}}),

    Custom2 = Table.Combine({ChangedType1,InsertedCustom})

in

    Custom2

Now as I said, this is just a super-simple example and in the real world you’d need extra functionality to do things like delete rows you’ve already loaded and so on; but that’s all doable I think. It’s also worth mentioning that I encountered some strange errors and behaviour when implementing this, partly due to Data Explorer still being in preview I guess, so if you want to recreate this query you’ll need to follow my instructions exactly.

You can download the sample workbook here, and the csv file here.

15 thoughts on “Accumulating Data In An Excel Table Using Data Explorer (Power Query) and PowerPivot

  1. I really liked this blog; made me think of accumulating a fact table in excel with parameterized queries; which spawned a whole other train of thought. It doesn’t appear to have gotten a get a ping back but I’m going to reference it on my blog southbaydba.com. Thanks for taking the time to do write this up!

  2. The idea of moving data from the data model back into excel is great but a huge inconvenience for large workbooks. I imagine this process would take a long time in those cases? I hope they solve this by adding more features that interact with the data model.

    Thank you Chris.

    1. Chris Webb – My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel.
      Chris Webb says:

      Yes, it would be a problem with large amounts of data. As a result I think this technique is a bit too much of a hack to use in the real world.

  3. Hi Chris! Thanks for this post.
    Do you know if there have been any updates that would allows us to handle this scenario more efficiently (as opposed to moving the data from the data model into a worksheet)?

    1. Chris Webb – My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel.
      Chris Webb says:

      No, there aren’t any. I don’t think it’s a scenario MS is going to concentrate on – if you want incremental loading that’s going to be when they ask you to upgrade to SSAS Tabular on the server.

  4. Hi Chris, thank you so much for your post, I tried and it worked but I think maybe I formatted system’s date in a different way than yours so after a few updates, things went wrong. But I get the ideas of it. I know that MS is not gonna have this function soon, so do you know any alternative ways to get and accumulate routine data from web (it would be best if it’s free). Thank you so much

    1. Chris Webb – My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel.
      Chris Webb says:

      There’s nothing from Microsoft that I know about; there are some other tools I’ve seen, like import.io, but I don’t know if they allow you to accumulate data rather than just scrape it.

  5. Hello –
    Is this possible in excel 2010? I have loaded my original query to power pivot using existing connections in the design tab. I have then added the additional custom code to append my power pivot table however I get the following error when I tried to update the connection –

    **The operation failed because the source database does not exist, the source table does not exist, or because you do not have access to the data source.
    More Details:
    OLE DB or ODBC error: [Expression.Error] A cyclic reference was encountered during evaluation..
    An error occurred while processing table ‘WallysHistory_Weeklyfilescombined’.
    The current operation was cancelled because another operation in the transaction failed.**

    Is there a way to fix this in 2010? Thanks~

    1. Chris Webb – My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel.
      Chris Webb says:

      It should work in Excel 2010 but I’m afraid I don’t have it installed so I can’t test it. Are you sure you are using the right tables? That error sounds like it could be caused by you using the table loaded from Power Query as the ‘existing’ data input, not a table sourced from Power Pivot.

    1. Chris Webb – My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel.
      Chris Webb says:

      In this case, the table was in the same file. However this is an old post and the technique was always a bit of a hack, so it may not work on all versions of Excel. The fact is that Power Query doesn’t support incremental refresh and it’s probably safer to accept that and find another solution to your problem outside Excel. Sorry…

  6. Chris, I have been assigned a project where I’m summarizing huge amounts of labor data into one excel worksheet. Heres the scenario I’m going back several years and I have collected the data for the different plants and separated them into different worksheet. Theres about 5million rows total. I appended these using the power query function and uploaded the data to the data model to create a powerpivot. The problem is that for the current year data it is updated montly. There is so much data that I was hoping to only store it in the model and delete the tables in the worksheets and keep the table for current year and add the latest data each moth and then refresh the data adding only the new. Then at the end of the year delete the table for that year and append a new year and so on. It seems as though if there were a way to delete query connections this would be simple because it would only refresh the existing connections but the trick would be keeping the data stored in the model from disappearing. Do you have any ideas on how to make this scenario work?

    1. Chris Webb – My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel.
      Chris Webb says:

      No, there’s no feasible way of doing this in Excel I think. Maybe you should use separate workbooks for each plant, and a further Excel workbook (with no data on any of the worksheets) to hold the consolidated data in the Excel Data Model.

Leave a Reply to MeganCancel reply