Many of my customers get the source data for their Power BI reports – usually Excel or csv files – sent to them as an email attachment on a regular basis. When they get one of these emails what usually happens next is that they save the attachment to a folder so Power BI can connect to it. This a tedious manual process and of course one that breaks down if they’re off sick or away on holiday. However if you are using Exchange or Exchange Online it is possible to eliminate this step and have Power BI connect directly to the attachment without downloading it. In this post I’ll show you how, using Exchange Online as an example.
First of all, the source data. Here’s an inbox with the usual junk in it; the most recent three emails are from the same person, with the same subject line, and each of them has an Excel file attachment called MailSalesDemo.xlsx:
The Excel files contain a simple table called SalesTable with some sales data in it:
If you’re going to do this properly it will be better to create a folder in your mailbox, and add a rule to make sure that all the emails with the data you need go to that folder, but I’m going to keep things simple and assume that the emails I need are in my inbox.
Next, you need to connect Power BI to Exchange. There are surprisingly few examples of how to do this, but this video is a good place to start:
…and Ken Puls has good post (it’s written for Excel/Power Query but it’s all relevant for Power BI) here:
In my case after I selected the Microsoft Exchange Online data source all I needed to do was enter my email address in the Mailbox Address dialog box, and then in the authentication dialog select Microsoft Account and sign in, to get access to my data.
Once you have connected the Navigator pane will appear; you should select Mail and then click the Edit button:
At this point the Power Query Editor screen will open and you will see a table containing one row for each email in your mailbox. You will now have to filter your emails so you only see the emails with the attachments containing data (don’t worry if there is more than one of these emails though). You will need to something like this:
- Filter by the Folder Path column so you only get the emails in the relevant folder
- Expand the Sender column so you can see the name and email address of the sender of each email, and filter so that you only get emails from the relevant person (assuming that these emails are always have the same sender)
- Filter by the Subject column (assuming that these emails always have the same subject line)
- Filter the Has Attachments column so you only get the emails with attachments – ie where the Has Attachments column contains a TRUE value
Once you’ve done this you should see a table that looks something like this (I have removed all non-relevant columns too to make things clearer):
The final step is to find the most recent email and get the attachment from it. To do this:
- Sort the table in descending order by the DateTimeReceived column
- Go to the Home tab on the ribbon, click the Keep Rows/Keep Top Rows button and enter the value 1
- UPDATE: something changed since I first wrote this post, resulting in an extra step being necessary at this point. You need to delete all the columns in the table except the Attachments column now; if you don’t, the query will return the specific email that’s listed when you’re designing the query, not whichever email is the most recent. This means the next screenshot is out of date, because it shows the columns that you need to delete, but it’s not a big change.
- Click on the value Table in the Attachments column of the one remaining row in the table:
- This will display a table with one row for each attachment on the selected email. If there is more than one attachment (which is not the case in my example) filter the table of attachments, for example by the Name column, so that you only have the attachment you want to get data from. Click on the value Binary in the AttachmentContent column of the one row:
- From this point on the experience should be very similar to the one you get when you connect direct to an Excel file: you should see a table containing all the worksheets, tables and named ranges in the Excel file attachment. Click on the Table value in the Data column for whatever worksheet/table/named range you need for your report and then use the Power Query Editor as normal, click Close and Apply and design your report. In my case I clicked on the Table link for the SalesTable table:
Here’s the very basic report I built:
After you have published your report you will also need to set up scheduled refresh on the dataset for a reasonable amount of time after whenever you expect to have received the email containing the data. There’s no need to use a Gateway if you are using Exchange Online, so in the Settings pane for the dataset in PowerBI.com choose Connect directly:
Under Data Source Credentials choose OAuth2 in the Authentication method dropdown box and then sign in:
Once you’ve done this, every time your report refresh it will use the data from the attachment in the most recently-received email and you have one less thing to remember to do every day. To be honest it’s a bit of a hack but it works so long as the emails you receive always come from the same sender, with the same subject, same attachment and so on.
It’s a shame that Flow doesn’t have a “refresh Power BI dataset” action built-in (why doesn’t it??) because if it did, it would be very easy to refresh the dataset whenever an email with data was received. I know this is possible with Flow if you call the Power BI API but that involves a lot of technical skill to set up. Thinking about it, if you use Flow you could probably solve the problem in a completely different way by saving the email attachment to OneDrive every time an email arrived… but that needs a separate blog post.