Using Email Attachments As A Data Source In Power BI

Many of my customers get the source data for their Power BI reports – usually Excel or csv files – sent to them as an email attachment on a regular basis. When they get one of these emails what usually happens next is that they save the attachment to a folder so Power BI can connect to it. This a tedious manual process and of course one that breaks down if they’re off sick or away on holiday. However if you are using Exchange or Exchange Online it is possible to eliminate this step and have Power BI connect directly to the attachment without downloading it. In this post I’ll show you how, using Exchange Online as an example.

First of all, the source data. Here’s an inbox with the usual junk in it; the most recent three emails are from the same person, with the same subject line, and each of them has an Excel file attachment called MailSalesDemo.xlsx:

image

The Excel files contain a simple table called SalesTable with some sales data in it:

image

If you’re going to do this properly it will be better to create a folder in your mailbox, and add a rule to make sure that all the emails with the data you need go to that folder, but I’m going to keep things simple and assume that the emails I need are in my inbox.

Next, you need to connect Power BI to Exchange. There are surprisingly few examples of how to do this, but this video is a good place to start:

https://www.youtube.com/watch?v=wvBR41V7_Yk

…and Ken Puls has good post (it’s written for Excel/Power Query but it’s all relevant for Power BI) here:

https://www.excelguru.ca/blog/2014/01/09/using-powerquery-with-email/

 

image

In my case after I selected the Microsoft Exchange Online data source all I needed to do was enter my email address in the Mailbox Address dialog box, and then in the authentication dialog select Microsoft Account and sign in, to get access to my data.

Once you have connected the Navigator pane will appear; you should select Mail and then click the Edit button:

image

 

At this point the Power Query Editor screen will open and you will see a table containing one row for each email in your mailbox. You will now have to filter your emails so you only see the emails with the attachments containing data (don’t worry if there is more than one of these emails though). You will need to something like this:

  • Filter by the Folder Path column so you only get the emails in the relevant folder
  • Expand the Sender column so you can see the name and email address of the sender of each email, and filter so that you only get emails from the relevant person (assuming that these emails are always have the same sender)
  • Filter by the Subject column (assuming that these emails always have the same subject line)
  • Filter the Has Attachments column so you only get the emails with attachments – ie where the Has Attachments column contains a TRUE value

Once you’ve done this you should see a table that looks something like this (I have removed all non-relevant columns too to make things clearer):

image

The final step is to find the most recent email and get the attachment from it. To do this:

  • Sort the table in descending order by the DateTimeReceived column

image

  • Go to the Home tab on the ribbon, click the Keep Rows/Keep Top Rows button and enter the value 1

image

  • UPDATE: something changed since I first wrote this post, resulting in an extra step being necessary at this point. You need to delete all the columns in the table except the Attachments column now; if you don’t, the query will return the specific email that’s listed when you’re designing the query, not whichever email is the most recent. This means the next screenshot is out of date, because it shows the columns that you need to delete, but it’s not a big change.
  • Click on the value Table in the Attachments column of the one remaining row in the table:

image

  • This will display a table with one row for each attachment on the selected email. If there is more than one attachment (which is not the case in my example) filter the table of attachments, for example by the Name column, so that you only have the attachment you want to get data from. Click on the value Binary in the AttachmentContent column of the one row:

image

  • From this point on the experience should be very similar to the one you get when you connect direct to an Excel file: you should see a table containing all the worksheets, tables and named ranges in the Excel file attachment. Click on the Table value in the Data column for whatever worksheet/table/named range you need for your report and then use the Power Query Editor as normal, click Close and Apply and design your report. In my case I clicked on the Table link for the SalesTable table:

image

 

Here’s the very basic report I built:

image

After you have published your report you will also need to set up scheduled refresh on the dataset for a reasonable amount of time after whenever you expect to have received the email containing the data. There’s no need to use a Gateway if you are using Exchange Online, so in the Settings pane for the dataset in PowerBI.com choose Connect directly:

image

Under Data Source Credentials choose OAuth2 in the Authentication method dropdown box and then sign in:

image

Once you’ve done this, every time your report refresh it will use the data from the attachment in the most recently-received email and you have one less thing to remember to do every day. To be honest it’s a bit of a hack but it works so long as the emails you receive always come from the same sender, with the same subject, same attachment and so on.

It’s a shame that Flow doesn’t have a “refresh Power BI dataset” action built-in (why doesn’t it??) because if it did, it would be very easy to refresh the dataset whenever an email with data was received. I know this is possible with Flow if you call the Power BI API but that involves a lot of technical skill to set up. Thinking about it, if you use Flow you could probably solve the problem in a completely different way by saving the email attachment to OneDrive every time an email arrived… but that needs a separate blog post.

25 thoughts on “Using Email Attachments As A Data Source In Power BI

  1. For some reason signed emails seem to appear in Outlook, but not in Power BI. Can someone confirm this?

  2. Hi Chris
    Thank you for your informative blogs.

    I feel, as you have mentioned in the last para, the easiest is to set up flow to save the attachments to a folder in OneDrive personal / OneDrive Business / SharePoint folder. And if the incoming files are of a monthly nature -meaning same excel file but with each months new data – then to a share point folder. Won’t this be the easiest to set-up?

  3. This is very cool. Another way to do this though is to use a Flow to automatically search for these emails and put the attachment in a specific folder. Then Power BI/Power Query only has to continually refresh for files in that folder.

    But the above is handy for those without Office 365 and Flow.

  4. Hi Chris, thanks for sharing this! I’ve experienced many cases like this where the source information flow is originated by mail, and I’ve managed most of them by creating shared folders on google drive or Sharepoint, where the updated / new files are deposited and then integrated into the Power BI Report through fetching the entire folder content. I wonder what is your experience and lessons learned in trying to change the culture in this manner, I’m sure you have many many interesting cases!

  5. Is there a way to connect to Outlook (connected to Hotmail or Gmail) rather than Exchange server

    1. Possibly. It looks like there is a REST API for both Outlook.com and Gmail, but you would need to do quite a lot of work to build a custom data connector to be able to use these APIs in Power BI.

  6. I went all the way to set up a Flow just to move files, but this is even simpler. Sleek and efficent soluton. Many thanks.

  7. Hi, i wonder if this solution will work if attachement name is changing every day (i have an excel export from system that has unique sequence of digits in the end of file name). could you help me?

    1. Make relevant rules to move those emails in a particular location(Which is not ased on attachment name but based on subject of the email, sender etc.) and follow te same logic.

  8. I have been using this method to put emailed alerts in to a format I can use to report on them. I have a question built a rule to send all the alerts to a specific folder. Then I pull all the email from that foldert and put them in to Power Query. its a lot of email growing every day so I wanted to moved the emails already loaded into different folder to speed up the refresh time. I am wondering If move the files will it remove the rows when I refresh the the connection?

  9. Hi There,

    first of all thanks for sharing this solution. Unfortunately i’ve a problem when i want to refresh the information after a new mail was received. It claims to not be able to find a “key does not match a row in the table” (free tranclated from german).
    I guess the problem is this part of the transformation where the attachment is extracted (like a hash of something, which changes from mail to mail?).
    Is there a workaround to tell power bi to take “the first spreadsheet”file or what did i do wrong?

    I appreciate any help. Thanks in advance!

    Part of the Transformation:

    ” #”Beibehaltene erste Zeilen” = Table.FirstN(#”Sortierte Zeilen”,1),
    #”AQMkAGY0MmRlZGMwLWQ0OWEtNDIwOC1hMzA4LTU4NWMzZmYANzEwMzEARgAAAwyWYGhlDlxMms9UFtfcJQMHAIuEUU3+a25DnY8yIedukOwAAAIBDAAAAIuEUU3+a25DnY8yIedukOwAAAIFTwAAAA==” = #”Keep First row”{[Id=”AQMkAGY0MmRlZGMwLWQ0OWEtNDIwOC1hMzA4LTU4NWMzZmYANzEwMzEARgAAAwyWYGhlDlxMms9UFtfcJQMHAIuEUU3+a25DnY8yIedukOwAAAIBDAAAAIuEUU3+a25DnY8yIedukOwAAAIFTwAAAA==”]}[Attachments],
    AttachmentContent = #”AQMkAGY0MmRlZGMwLWQ0OWEtNDIwOC1hMzA4LTU4NWMzZmYANzEwMzEARgAAAwyWYGhlDlxMms9UFtfcJQMHAIuEUU3+a25DnY8yIedukOwAAAIBDAAAAIuEUU3+a25DnY8yIedukOwAAAIFTwAAAA==”{0}[AttachmentContent],

    1. I think something has changed in the Exchange data source since I wrote this post, but I can see what the problem is. In the second step in your example you need to edit the code from

      = #”Keep First row”{[Id=”AQMkAGY0MmRlZGMwLWQ0OWEtNDIwOC1hMzA4LTU4NWMzZmYANzEwMzEARgAAAwyWYGhlDlxMms9UFtfcJQMHAIuEUU3+a25DnY8yIedukOwAAAIBDAAAAIuEUU3+a25DnY8yIedukOwAAAIFTwAAAA==”]}[Attachments],

      so that it is

      = #”Keep First row”{0}[Attachments],

  10. The system producing the information is dynamically generating an attachment name I think because when the next days email comes into the mailbox I get an error.

    Any thoughts?

  11. Thank you very much! This tutorial saved me tons of time and effort!

    Just a note that you must remove the column marked “Id” for some reason in order for this to work.

    Once I sorted that out, I was on easy street.

  12. Thanks very much for this tutorial. It saved me a ton of time.

    Just a note that you have to filter out the column marked “Id” for some reason. If you leave it in, refreshing the report throws an error.

    Once I sorted that out, I was on easy street. Thanks!

  13. Hi Chris!

    This is a great walkthrough! I am having some trouble however getting it to read a third party scheduled attachment as an excel file. The email is always from the same sender, same subject, with an updated but consistent single attachment. The issue is that it is from a third party service and therefore is emailed in ‘protected view’ and the query editor gives me, “DataFormat.Error: External table is not in the expected format.” The only workaround I can find is manually open and save the excel file, but it no longer makes any sense to automate this if I have to manually open and save in order to refresh the data. Have you encountered this before? Is there any way for me to solve it automatically on my end? Thanks for the help!

  14. Hi Chris, this is a great blog; I’ve successfully implemented. If you don’t want to replace the data each time, but keep history, how can you append the data using this process?

  15. Hi chris, How do you add a calculated column into the query with the name of the subject.

    Thanks and looking forward to hearing from you,
    Anand!

  16. What if there is a new daily csv attachment, and you want Power BI only to connect to the most recent file? For an organization.

  17. Thanks for your valuable explanation, could you please help me when importing excel from mail with formate Xls.

Leave a Reply to EdhCancel reply