Video: Power BI Data Privacy Settings Deep Dive

Over the past year or so I’ve been delivering a presentation on the Power Query engine’s data privacy settings at various conferences and Power BI user groups, in an attempt to try to pull together all the knowledge I have on this complex topic. Luckily, when I presented this session at the London Power BI User Group recently, they recorded it and posted it on YouTube here:

If you’re struggling with data privacy errors like:

Formula.Firewall: Query ‘Query1’ (step ‘xyz’) is accessing data sources that have privacy levels which cannot be used together. Please rebuild this data combination.

or

Formula.Firewall: Query ‘Query1’ (step ‘xyz’) references other queries or steps, so it may not directly access a data source. Please rebuild this data combination.

…in either the Power Query Editor in Power BI or Power Query/Get&Transform in Excel, then I hope this video will help you understand why you’re getting these errors and what you can do to avoid them.

A lot of what I show in this session draws on other material, such as:

  • My five-part series of posts on the Power Query data privacy settings that starts here
  • My post on how credentials and data privacy settings are stored for dynamic data sources here
  • My post here on the performance overhead of applying data privacy checks
  • Ehren von Lehe’s detailed paper on (available here) on how the engine partitions data sources while applying data privacy checks. One of the things I show in the video is that it’s now possible to see these partitions using Power Query Query Diagnostics (see here for some details – although I should probably devote a whole blog post to this in the future)

Lastly, one minor correction to something I said in the video: at the 44:32 mark I do a demo that shows how you can use M functions to avoid a Formula Firewall error. While this trick works in Power BI Desktop and Excel, it results in a dataset that can’t be refreshed in the Power BI Service unfortunately.

 

Limit The Amount Of Data You Work With In Power BI Desktop Using Parameters And Deployment Pipelines

If you’re working with large amounts of data in Power BI you may find that you have problems because:

  • Your pbix file is very large
  • You spend a long time waiting for refreshes to finish in Power BI Desktop – and if you’re developing, you may need to refresh your dataset frequently
  • It takes a long time to publish your dataset to the Power BI Service

Wouldn’t it be great if there was a way to work with a small subset of your data in Power BI Desktop and then, after you publish, load all the data when you refresh? The good news is that this is now possible with the new deployment pipelines feature in Power BI!

Assuming that you know the basics of how deployment pipelines work (the documentation is very detailed), here’s a simple example of how to do this. Let’s say that you want to use data from the FactInternetSales table in the Adventure Works DW 2017 SQL Server sample database in your dataset. When you import the data from this table and open the Advanced Editor to look at the M code for the query, here’s what you’ll see:

let
  Source = Sql.Databases("MyServerName"),
  AdventureWorksDW2017 = Source{[Name = "AdventureWorksDW2017"]}[Data],
  dbo_FactInternetSales = 
   AdventureWorksDW2017{[Schema = "dbo", Item = "FactInternetSales"]}[Data]
in
  dbo_FactInternetSales

This query, of course, imports all the data from this table. To cut it down to a smaller size, the first thing to do is to create a new Power Query parameter (called FilterRows here) of data type Decimal Number:

Parameter

Notice that the Current Value property is set to 5. The purpose of this parameter is to control the number of rows from FactInternetSales that are loaded into the dataset. Here’s an updated version of the Power Query query above that uses this parameter:

let
  Source = Sql.Databases("MyServerName"),
  AdventureWorksDW2017 = Source{[Name = "AdventureWorksDW2017"]}[Data],
  dbo_FactInternetSales = 
  AdventureWorksDW2017{[Schema = "dbo", Item = "FactInternetSales"]}[Data],
  FilterLogic = 
   if 
     FilterRows <= 0 
    then 
     dbo_FactInternetSales 
    else 
     Table.FirstN(
      dbo_FactInternetSales, 
      FilterRows
     )
in
  FilterLogic

A new step called FilterLogic has been added at the end of this query that implements the following logic:

  • If the FilterRows parameter is less than or equal to 0 then return all the rows in the FactInternetSales table, otherwise
  • If FilterRows is more than 0 then return that number of rows from the table

Given that the FilterRows parameter is set to 5, this means that the query now returns only the top 5 rows from FactInternetSales:

Top5

It’s important to point out that a filter like this will only make your refreshes faster if the Power Query engine is able to apply the filter without reading all the data in the table itself. In this case it can: with a SQL Server data source query folding ensures that the SQL query generated for the refresh only returns the top 5 rows from the FactInternetSales table:

SQL

Here’s a simple report with a card that shows the number of rows loaded into the table:

Top5Report

At this point you have your cut-down dataset for development in Power BI Desktop.

Next, publish this dataset and report to a workspace that is assigned to the Development slot in a deployment pipeline and then deploy them to the Test workspace:

Pipeline

Then click the button highlighted in the screenshot above to create a new dataset rule that changes the value of the FilterRows parameter to 0 when the dataset is deployed to the Test workspace:

ParameterRule

With this rule in place, when the dataset in the Test workspace is refreshed, the logic in the query above now ensures that all the data from the FactInternetSales table is loaded into the dataset. Instead of just 5 rows, the report now shows that the full 60000 rows of data have been loaded:

FullDataReport

 

Monitoring Power Query Memory Usage With Query Diagnostics In Power BI

In the April release of Power BI Desktop the Power Query Query Diagnostics feature was enhanced so that you can now return performance counter data. As the blog post says:

When you run performance counters, every half second Power Query will take a snapshot of resource utilization. This isn’t useful for very fast queries but can be helpful for queries that use up a lot more resources.

When might this be useful in the real world? In my last blog post I had a chart that shows the amount of data that Power Query reads from disk while loading a large-ish JSON file, created from data collected in Process Monitor using this technique. Here it is again:

SecondLoad

The x axis is relative time in seconds from when Power Query started reading the data; the y axis shows the amount of data read. Notice how data is read at a constant rate for the first 1.5 seconds, but that after the 1.5 second mark the throughput flattens? What could be causing this?

Although this does not happen consistently, and I didn’t collect the necessary data when I ran this particular test, the answer is likely to be related to how the Power Query engine uses memory.

This is a topic I have blogged about before and I strongly recommend that before carrying on you read this post on the Container Size property that can be set on dataflows in Power BI Premium. Here’s a quote from Curt Hagenlocher of the Power Query dev team from that post that is relevant here too, about how the Power Query engine uses memory when a query executes:

Certain operations force the rows of a table value to be enumerated. If the enumeration operation itself is expensive, then using Table.Buffer can be a performance optimization because we store the values in memory so that second and subsequent enumerations of the rows go against memory.

If the table is only being enumerated once (which is the most common scenario) or if the underlying enumeration is fast anyway, then Table.Buffer won’t help performance.

Table.Buffer can actually hurt performance in some cases, because we cap RAM usage of the query at 256 MB — which means that a query which uses more than 256 MB is now forced to page RAM to/from disk. Enough paging, and the performance cost can be quite dramatic.

Currently, “table at a time” operations like joins, sort, many groupings, pivot, unpivot, etc., all happen in RAM (unless folded). For large tables, these will consume a lot of memory.

Now it turns out that when Power Query reads a JSON file it has to load the whole file into memory – something that isn’t true for other types of data source. So maybe, while reading the JSON file used in my test, the slow-down in throughput was caused by paging?

With the option to collect performance counters turned on (see the announcement blog post for how to do this) I opened up the pbix file with the query from my last blog post in, went to the Power Query Editor, and collected some query diagnostics data by right-clicking on the last step in my query and selecting Diagnose:

Diagnose Step

It soon became clear that the Power Query engine has a pool of mashup containers that it reuses (yes, go and read that blog post I told you to read!) and it seems like they often need to do some garbage collection before a query runs. So, to get a nice-looking graph, I did something highly unsupported that still seemed to work: I opened Task Manager and killed all the Microsoft Mashup Evaluation Container processes I could see under Power BI. Having done this, when I collected my performance counter data I could build the following graph showing Power Query memory usage while the query was evaluating:

Memory

The x axis shows the number of seconds elapsed since the start of the query; the y axis shows the value in bytes for the Commit and Working Set performance counters. The yellow line, for the Commit (bytes) performance counter, shows the amount of virtual memory used by Power Query. The blue line, for the Working Set (bytes) performance counter, shows the amount of physical memory used by Power Query; as you can see it reaches 256MB (indicated by the red dotted line) halfway through and never exceeds that. While Commit is greater than Working Set paging must be happening and Power Query performance may suffer as a result.

Over the years I have found that one of the most effective ways of improving Power Query performance is to try to avoid transformations like those that Curt mentions above that require a large tables to be held in memory – although that’s easier said than done. At least now we have an easy way of seeing where memory might be causing problems for Power Query performance using this new feature.

Speed Up Data Refresh Performance In Power BI Desktop Using Table.View

It can sometimes be frustrating to work with slow data sources or complex Power Query queries in Power BI Desktop: you open the Power Query Editor, make some changes, click Close & Apply and then wait a loooong time for your data to refresh. In this post I’m going to show you a technique that can cut this wait by up to 50%. It involves some fairly complex M code but I promise you, the effort is worth it!

In fact, what I’m going to describe is more or less what I showed towards the end of my appearance on Guy In A Cube last year and in a few other posts, but at that time I didn’t understand properly why it worked or what the performance implications actually were. Now, thanks to a lot of help from Curt Hagenlocher of the Power Query development team I have all the details I need to blog about it.

Let’s see a simple example. Say you have a large JSON file – for this test I generated one that is 67MB containing random data using this handy online tool – and you want to load it into Power BI. You’ll end up with a query that looks something like this:

let
  Source = Json.Document(File.Contents("C:\generated.json")),
  #"Converted to Table" = Table.FromList(
    Source, 
    Splitter.SplitByNothing(), 
    null, 
    null, 
    ExtraValues.Error
  ),
  #"Expanded Column1" = Table.ExpandRecordColumn(
    #"Converted to Table", 
    "Column1", 
    {"_id", "index", "guid", "isActive", "balance"}, 
    {"_id", "index", "guid", "isActive", "balance"}
  ),
  #"Changed Type" = Table.TransformColumnTypes(
    #"Expanded Column1", 
    {
      {"_id", type text}, 
      {"index", Int64.Type}, 
      {"guid", type text}, 
      {"isActive", type text}, 
      {"balance", type text}
    }
  )
in
  #"Changed Type"

Here’s what the output of this query looks like:

Output

On my laptop this query takes around 7-8 seconds to load. Using Process Monitor to see how much data is being read from the file (see this post for more details) shows that Power Query is reading the data from this file twice. This graph, generated using data from Process Monitor, has time on the x axis and amount of data read from the file on the y axis, and the two peaks indicate that the data is being read twice:

FirstLoad

Why? When you refresh a table in Power BI Desktop two things happen:

  • First, Power BI has to check what columns are present in the table. To do this, it runs the query for the table but asks the Power Query engine to filter the table so it returns zero rows.
  • Secondly the query is run again but this time with no filter, so all rows are returned, and this is when the data is actually loaded into the table in Power BI.

It’s on this first run of the query where problems can occur. If you’re using a data source like SQL Server and you’re not doing any complex transformations then query folding will take place, so Power BI’s request to return the table but with no rows can be handled very efficiently. However, if you’re using a data source where query folding is not possible (such as Excel, CSV or JSON files) or you have complex transformations that stop folding taking place then the only way Power Query can work out what columns the query returns is by running the entire query. That’s what is happening in the example above.

This only happens in Power BI Desktop, though. When you refresh a dataset in the Power BI Service the query is only run once.

The solution is to trick the Power Query engine into running these zero-row filter queries instantly, and you can do that using the Table.View M function. I blogged about Table.View here (and I strongly suggest you read that post before carrying on) but that example didn’t quite get to the solution you need here. Here’s a new version of the query above with an extra step:

let
  Source = Json.Document(File.Contents("C:\generated.json")),
  #"Converted to Table" = Table.FromList(
    Source, 
    Splitter.SplitByNothing(), 
    null, 
    null, 
    ExtraValues.Error
  ),
  #"Expanded Column1" = Table.ExpandRecordColumn(
    #"Converted to Table", 
    "Column1", 
    {"_id", "index", "guid", "isActive", "balance"}, 
    {"_id", "index", "guid", "isActive", "balance"}
  ),
  #"Changed Type" = Table.TransformColumnTypes(
    #"Expanded Column1", 
    {
      {"_id", type text}, 
      {"index", Int64.Type}, 
      {"guid", type text}, 
      {"isActive", type text}, 
      {"balance", type text}
    }
  ),
  OverrideZeroRowFilter = Table.View(
    null, 
    [
    GetType = () => 
      type table[
      _id = Text.Type, 
      index = Int64.Type, 
      guid = Text.Type, 
      isAction = Text.Type, 
      balance = Text.Type
    ], 
    GetRows = () => 
      #"Changed Type", 
    OnTake = (count as number) => 
      if count = 0 then 
      #table(
      type table[
        _id = Text.Type, 
        index = Int64.Type, 
        guid = Text.Type, 
        isAction = Text.Type, 
        balance = Text.Type
      ], 
      {}
    ) 
    else 
     Table.FirstN(#"Changed Type", count)]
  )
in
  OverrideZeroRowFilter

The OverrideZeroRowFilter step is where the magic happens. It uses Table.View to override query folding behaviour by intercepting what happens when the table returned by the query is filtered. There are three fields in the record in the second parameter of Table.View that you need to change:

  • GetType returns a table type that describes the columns and their data types present in the output of the query. The six columns listed here are the six columns you can see in the screenshot of the query output above. It’s very easy to generate the required M code when you use the custom function that I blogged about here.
  • GetRows returns all the rows that the query can return, that’s to say the table returned by the #”Changed Type” step.
  • OnTake is used when a top n filter is applied to the table returned by the query.  In this case the code looks at the number of rows that are being requested (specified in the count parameter) and if that’s zero, it just returns an empty table with the same columns as the #”Changed Type” step; if it’s more than zero then it uses Table.FirstN to get the actual number of rows requested from #”Changed Type”. This means that when Power BI does that zero-row filter it can now happen immediately because there’s no need to go back to the data source or execute any of the transformations in the query.

This version of the query now runs in 4-5 seconds, and Process Monitor now shows that the JSON file is only read once and, obviously, reading the data once is a lot faster than reading it twice:

SecondLoad

Look at how the graph flattens at the end… but that’s something for another blog post.

There is a downside to this approach: you have to hard-code the table schema that you expect your query to return, and if you change your query to return different columns you’ll have to update the table type in the last step.

I’ve used this technique on other slow queries and it has made a massive difference to the development experience in Power BI Desktop – one query that was taking five minutes to refresh when I closed the Power Query Editor went down to two and a half minutes. If you try this out yourself please let me know if it makes a difference by leaving a comment.

[Don’t forget that there are other things you can do that will also speed up the development experience for slow queries: you should definitely turn off the “Allow data preview to download in the background” option, and you might want to consider turning off data privacy checks so long as you fully understand what the implications]

 

Power Query Formatter

If you’ve ever used DAX Formatter to format your DAX code and wondered why there isn’t an equivalent for Power Query/M, then wonder no more: the nice people at Power Pivot Insights in Germany (read their German-language blog here) have built one. You can find it here:

https://powerqueryformatter.com/

PQ Formatter

They also have an API you can use too.

Bonus information: did you know that Microsoft also has an open source parser for M, available here?

Calling The Power BI Export API From Power Automate, Part 3: Creating An Alerting Solution

In my last two posts (see here and here) I showed you how you can call the new endpoints in the Power BI REST API for exporting a report. There are some obvious, extremely useful applications for this such as emailing PDF exports of a report out to large groups of users. In this post, however, I’ll show you how you can use this for something different: creating a Power BI alerting solution.

Now I know what you’re thinking: we already have alerts in Power BI and we can also trigger a Power Automate flow when an alert fires. This functionality is a bit limited though: you can only create an alert on a dashboard, not a report, and then only on some visuals; what’s more you can only use fairly simple rules to trigger an alert. You may need something more sophisticated, but while Power Automate would seem to be the perfect tool for building an alerting solution there’s another problem: it isn’t possible to query a Power BI dataset from Power Automate… until now.

How? The new export endpoints in the Power BI REST API allow you to export a paginated report to a number of different formats, one of which is XML. This means you can write any DAX query you want, use it in a table in a paginated report, export the paginated report to XML in Power Automate and bingo – you have the output of the query in a format that Power Automate can read and do something useful with.

Let’s see an example. Take the following table of data in a Power BI dataset:

Source data

It shows sales for different countries, and let’s say that if the sales value exceeds the threshold given threshold then you want to send an email to the address given in the last column.

The following DAX query filters this table to return all the rows where sales is greater than the threshold:

EVALUATE
FILTER(
'Sales',
'Sales'[Sales]>'Sales'[Threshold]
)

Query output

It’s quite easy to create a basic paginated report in Power BI Report Builder with just a single tablix to display the output of this query:

Paginated report output

Now, let’s take the Power Automate flow that I described in my last post and alter it slightly.

First of all, instead of exporting to CSV as I did last time, you need to change the action that calls the Export To File endpoint to export the report to XML:

XML export option

More substantial changes are needed at the end of the flow, where the exported report is returned. Here’s what this part of the flow looks like at a high level after the changes:

Overview End

The first action shown here, Download the exported report file, gets the XML returned from Power BI. The SSRS documentation has a lot of detail about how a report gets rendered to XML here, but by keeping the report very basic it’s easy to understand the format of the XML. Here’s what gets returned in this case:

<?xml version="1.0" encoding="UTF-8"?>
<Report 
xmlns="AlertsPaginatedReport" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="AlertsPaginatedReport 
https://redacted" 
Name="AlertsPaginatedReport"> 
<Tablix1> 
<Details_Collection> 
<Details Country="UK" Sales="5" 
Threshold="2" Email="chris@mycompany.com" /> 
<Details Country="New Zealand" Sales="6" 
Threshold="1" Email="phil@mycompany.com" /> 
<Details Country="Netherlands" Sales="5" 
Threshold="2" Email="kasper@mycompany.com" /> 
</Details_Collection> 
</Tablix1> 
</Report>

The approach I’ve used to consume this XML in Power Automate is basically the one described in this post I found. The Compose action uses an XPath query to return an array containing one item for each row in the query results; writing the XPath query was quite painful but I finally got it working. Here’s the expression from the action:

xpath(

xml(body('Download_the_exported_report_file')),

'/*[local-name() = ''Report'']/*[local-name() = ''Tablix1'']/*[local-name() = ''Details_Collection'']/*[local-name() = ''Details'']')
Next, an Apply to each is used to iterate over each item in this array:
Apply to each

 

Finally, inside this loop, another Compose action retrieves the email address from the current iteration and this is then used to send an email:

Get email

Here’s the expression used in the Get Email Address action:

xpath(

xml(item()),

'string(/*[local-name() = ''Details'']/@Email)')
And that’s it. It’s a bit of a convoluted workaround, I admit, but it does the job; please also bear in mind the limitations of the export API listed here.
I’m sure there a lots of other things apart from alerting with the ability to consume the output of a DAX query in Power Automate, so if you have any good ideas please let me know in the comments!

Calling The Power BI Export API From Power Automate, Part 2: Creating A Flow That Exports A Paginated Report To A CSV File

In my last post I showed how to create a Power Automate custom connector for the new endpoints in the Power BI REST API for exporting a report to a file. In this post I’ll show you how to use this custom connector in a flow in Power Automate.

The three endpoints in the Power BI API that need to be called to export a report (and which have been included in the custom connector from my last post) are as follows:

  • Export To File In Group must be called first, to start the process of exporting either a Power BI report or a paginated report to a file. Exporting may take some time, though, so the exported file doesn’t get returned at this point. In the headers of the response there is a retry-after value in seconds telling you how long you must wait initially, before…
  • …calling Get Export To File Status In Group to find out whether the export is ready or not. If it isn’t ready you again need to wait the number of seconds specified in the retry-after header before checking the status again. If it is ready, you can then…
  • …call Get File Of Export To File In Group to download the export file.

This is all described in the C# example code here, so the challenge is to translate this into a flow. In my case, I want to export the following paginated report to a CSV file:

Paginated report

Here’s what my flow looks like at the top level:

High level view

To make things easy to understand I’m using a manual trigger to start the flow and hard-coding the report I want to export. The second Action calls Export To File In Group:

Start export

You will need to go into the Settings for this Action and set the Asynchronous Pattern option to Off:

AynchOff

As I said, in the response from this first API call the retry-after header tells you how many seconds you should wait before checking the export status for the first time. This value is used in the next Action, which adds the delay:

First delay

Once this delay has passed there’s a Do loop that will call the Get Export To File Status In Group endpoint up to ten times to check if the export is ready:

Do loop

This call uses the export id found in the response from the call to Export To File In Group:

Check status

If this call returns that the status is Running or NotStarted, then it will again wait for the number of seconds specified in the retry-after header; if not then there is no delay:

Second delay

The expression on the Do loop breaks the loop if the export status is Succeeded or Failed:

@or(
equals(body(‘Call_Export_Status_endpoint_to_check_if_export_is_ready’)?[‘status’], ‘Succeeded’),
equals(body(‘Call_Export_Status_endpoint_to_check_if_export_is_ready’)?[‘status’], ‘Failed’)
)

After the loop, there is a check to see if the export status is Succeeded:

Last condition

If the export status is Succeeded, then the file is downloaded and saved to OneDrive for Business:

Save file

Otherwise, the flow sends a failure notification email:

Failure email

And here’s the exported CSV file viewed in Excel Online:

Excel

And that’s it. I know this flow isn’t as robust as it could be, but I hope it gives you an idea of how to use the custom connector to export Power BI reports in Power Automate; as I said last time, I have very little Power Automate experience so please forgive any newbie mistakes! If you can see a way to improve on what I’ve done here, please leave a comment.

Automated exports from a Power BI paginated report to a CSV file is already very useful but there are other fun things we can do with this – and in my next post I’ll show you another, less obvious example.

Calling The Power BI Export API From Power Automate, Part 1: Creating A Custom Connector

Recently, a new set of endpoints in the Power BI REST API for exporting Power BI reports and paginated reports to files went into public preview (see the main announcement here and the paginated reports announcement here). Since there are all kinds of cool things you can do with this I thought I would write a few posts on how to use these endpoints in Power Automate. In this post I’ll talk about setting up a custom connector in Power Automate; in the next post I’ll talk about how to use this custom connector in Power Automate; and after that I’ll show you some less obvious uses for all this.

Before we go any further, and before you get too excited, you should read the limitations of the public preview listed here, especially those around the number of report pages that can be exported per hour and the number of pages in a report that can be exported. Also, this functionality is only available with Power BI Premium or Power BI Embedded.

Creating a Power Automate custom connector for the Power BI REST API is something that several people have blogged about already in detail. Konstantinos Ioannou has a very detailed walkthrough here; Jese Navaranjan has a video walkthrough here; and I blogged about how you could use the Swagger definition of the Power BI REST API to create a custom connector here. I’m not going to go over all these steps again but there are a few specifics that need pointing out if you want to build your custom connector manually. If you’re lazy, I exported my custom connector to a Swagger file and you can download it here; you should be able to import it and create your own custom connector very easily.  I don’t pretend to be a Power Automate expert so please excuse any newbie mistakes!

The three endpoints that you’ll need to use to export to a file (unless your report is in your My Workspace, in which case there are three other equivalent endpoints) are Export To File In Group, Get Export To File Status In Group and Get File Of Export To File In Group. There’s good documentation for regular Power BI reports here and paginated reports here, but in summary Export To File In Group starts the export process but because this might take a long time, doesn’t return the exported file; Get Export To File Status In Group allows you to check the status of the export; and Get File Of Export To File In Group returns the file once the export is ready.

Creating a custom connector in Power Automate is a four-step process and steps 1, 2 and 4 (“General”, “Security” and “Test”) are well covered in the guides above. Step 3 (“Definition”) is where you need to create three Actions for the three endpoints above.

After filling in the General and Security pages, go to the Definition page and click the New action button to create an Action for the Export To File In Group endpoint. You first need to fill in the information in the General section:

ExportFileGeneral

Then in the Request section click the Import from sample button, select POST and paste

https://api.powerbi.com/v1.0/myorg/groups/{groupId}/reports/{reportId}/ExportTo

…into the URL box and the sample payload here into the Body box:

ExportFileRequest

Setting up the other two endpoints is similar except that you need to select GET instead of POST and you don’t need to paste anything into the Body box. Here are the two templatised URLs to use for them:

Get Export To File Status In Group: https://api.powerbi.com/v1.0/myorg/groups/{groupId}/reports/{reportId}/exports/{exportId}

Get File Of Export To File In Group: https://api.powerbi.com/v1.0/myorg/groups/{groupId}/reports/{reportId}/exports/{exportId}/file

There’s more to do on this page, but at this point you should go to the Test page and test these three new Actions. For Export To File In Group you need to at least enter a groupId, a reportId and a format:

TestBox

The groupId and reportId can be found by navigating to your report in the Power BI portal; you can extract the groupId and reportId from the URL like so:

URLgroupsreports

The valid values for format (and the other parameter values) are in the docs.

Assuming that it all works, you will be able to scroll down and see the response. Copy all the JSON in the Body section and save it somewhere temporarily.

ExportId

Do the same thing for the other two Actions you have created; these two actions take a third parameter called exportId, which is the id value in the response that I’ve highlighted in the screenshot immediately above.

Once you have the Body text from Export To File In Group and Export To File Status In Group (the response from Get File Of Export To File In Group should be your report export), go back to the Definitions page and for these two Actions scroll down to the Response section and click on Add default response.

ResponseSample

There are two things to do here for both Actions. First, paste the responses you got on the Test pane into Body; next, paste:

retry-after 30

…into the Headers section and then click Import. This will make the output of the Actions much easier to consume in Power Automate later on.

Your custom connector is now ready, and in part 2 of this series I’ll show you how to use it in Power Automate.

[Thanks to Jaime Tarquino and Chris Finlan for their help getting this working]

 

Five Mistakes To Avoid When Migrating To Power BI From Another BI Platform

Now that Power BI is so popular, it’s becoming more and more common for organisations to migrate their reports from other BI platforms to Power BI. And why not? You’ll be moving to a modern, industry-leading, cloud-native BI platform and probably saving a lot of money in licensing costs too. As a Microsoft employee I wholeheartedly encourage this (obviously!) but it’s not without its pitfalls. In this post I’m going to highlight some common mistakes I’ve seen people make when migrating to Power BI so that you can avoid making them yourself. It isn’t always a simple lift-and-shift job…

#1 You may need to remodel your source data

Every report has a data source and getting source data in the right format for your BI platform is a substantial task – so much so, that you might be tempted to put Power BI on top of the data sources you have created for your previous BI platform with no changes. However different BI platforms need their data in different formats. Many BI platforms like their data munged together in one big table, sometimes even with data at different granularities in the same table. Power BI, on the other hand, likes its source data modelled as a star schema (you can find out what a star schema is and why it’s important here). If you don’t model your data as a star schema you may find that you see incorrect values in your reports, that report performance is poor, and that it’s a lot harder to write the DAX calculations that you need.

#2 Understand how Power BI works with data

Even if you have a star schema, you also have to understand that how Power BI works with data may be different to the way your previous BI tool worked. As Marco Russo and Alberto explain here, Power BI wants to you invest time up-front to create a semantic layer and once you have done this you’ll find that building reports and calculations is much quicker and easier. If you’re wondering why you can’t just write a SQL query to get the data for that chart you need to build, you’ve made this mistake.

#3 Don’t try to recreate the exact functionality of your old BI platform

If you’re building a Power BI report to replace an existing report on a legacy platform, and you ask your users what they want the report to look like, the most common reply is “Just like the old one”. This is a danger sign! One surefire way to make a Power BI report hard to build and slow to run is to try to make it do something it wasn’t designed to do.

One example of this is the way some users ask for reports that recreate the gigantic tables they are used to seeing in their old Excel reports. These tables may be slow to render in Power BI (especially if they have hundreds or thousands of rows) and what’s more, they don’t make the data easy to understand – better visualisation options are always available. My colleagues Adam and Patrick discussed this problem in detail in a recent video that’s well worth watching.

Another, more specific example, is the requirement to allow end users to change the measures or fields that are shown in a visual in a Power BI report. This is a feature we have on our public roadmap but I’ve seen many customers create tie themselves in knots trying to implement this using existing functionality; it is possible, but just because something is possible doesn’t mean you should do it. Once again, needless complexity, slower development and performance problems are the result.

Instead what you should be doing is understanding the business problem the report is trying to solve and implementing a solution using the functionality Power BI has built in. I know that users can be very stubborn about things like this, but trust me, it’s better to have the battle now rather than deal with the consequences of doing what they want you to do.

#4 Don’t forget about Analyze in Excel and Paginated Reports

Following on from the last point, if your users want to be able to explore their data by changing the measures and fields used in a visual they are probably thinking of how they use PivotTables and PivotCharts in Excel. And if that’s what they want, why don’t you let them use the real thing? Power BI’s Analyze in Excel feature doesn’t give you something that looks like an Excel PivotTable, it gives you an actual PivotTable connected to data stored in Power BI. It’s the functionality your users are comfortable with none of the downsides of traditional Excel reports such as the tedious, error-prone, manual data refreshes. What’s more you can also use Excel cube functions for more complex report layouts, such as those needed by financial reports. This video by Peter Myers is a great introduction to cube functions – they work with Power BI datasets in the same way they work with Analysis Services cubes or Power Pivot.

Similarly if, after everything I’ve just said, you still want to create reports sourced from hand-written SQL queries with gigantic tables that can be printed easily, you’ll find that Power BI Paginated Reports work much better than regular Power BI reports. Closely related to SQL Server Reporting Services, one of the most popular BI platforms in history, paginated reports have a different set of strengths compared to regular Power BI reports. Use the right tool for the job!

#5 This is your chance to change who does what

Migrating to Power BI is a chance to make a break from the old ways of doing things, and that includes who does all the work. If you’ve worked in BI for any length of time you’ll have seen plenty of examples of these two extreme approaches to BI development and the pain that goes with them:

  • Corporate BI, where all the work is done by the IT department – which quickly becomes the bottleneck, unresponsive to business needs and unable to understand them properly.
  • Excel Hell, where the business builds everything itself – and ends up with a lot of duplicated effort, multiple versions of the truth, and reports that break when the person who built them leaves for a new job.

Power BI lets you chart a course between these two. It’s easy enough to use, and affordable enough, for you to deploy it to a larger number of users in your organisation than any other BI tool except Excel. At the same time it gives you the tools you need to avoid the problems of Excel Hell: centralised data, automated refresh, security, monitoring and a lot more. This, for me, is the key to all successful Power BI deployments: empowered users and the IT department working together as a team, each doing what they do best.

Keep The Existing Data In Your Power BI Dataset And Add New Data To It Using Incremental Refresh

Power BI incremental refresh is a very powerful feature and now it’s available in Shared capacity (not just Premium) everyone can use it. It’s designed for scenarios where you have a data warehouse running on a relational database but with a little thought you can make it do all kinds of other interesting things; Miguel Escobar’s recent blog post on how to use incremental refresh for files in a folder is a great example of this. In this post I’m going to show you how to use incremental refresh to solve another very common problem – namely how to get Power BI to keep the data that’s already in your dataset and add new data to it.

I know what you’re thinking at this point: isn’t this what incremental refresh is meant to do anyway? Well, yes it is, but as I said it’s designed to work in scenarios where a relational data warehouse stores a copy of all the data that’s in your dataset and in some cases you don’t have that luxury. For example, you may want to:

  • Connect to a data source that only gives you a set of new sales transactions each day, and add these sales transactions to the ones you have already stored in a Power BI dataset
  • Take a snapshot of a data source, like an Excel workbook, that is changing all the time and store each of these daily snapshots of the contents of the workbook in a Power BI dataset

The current Power BI incremental refresh functionality doesn’t make it easy to do either of these things, and that’s why I’ve written this post.

Let me be clear though: in all these cases I recommend that you don’t use the technique I’m going to show you. If possible, you should stage a copy of each day’s data in a relational database (ie build that data warehouse) or even as text files in a folder (Power Automate may be useful to do this) and use that staged copy as the data source for Power BI. This will allow you to do a full refresh of the data in your dataset at any point in the future if you need to, or create a completely new dataset, even though it means you have to do a lot of extra work. If it isn’t possible to do this, or you’re too lazy or you’re just curious to see how my technique works, read on.

For the example I’m going to show in this post I’m going to use a web-based data source, an RSS feed from the BBC News website that returns a list of the current top stores on the site. You can find it here:

http://feeds.bbci.co.uk/news/rss.xml

RSS is based on XML and there’s no authentication needed to access this feed, so loading a table of these top stories into Power BI is very easy – so easy, I’m not going to bother explaining how to do it. Instead I’m going to show you how to use incremental refresh to store a copy of this list of top stories every day in the same table in a dataset.

First of all, for incremental refresh to work two Power Query parameters of data type date time need to be created called RangeStart and RangeEnd. More details about how to create them can be found in the docs here. Power BI expects them to be used to filter the rows loaded into a table; it also uses them to partition these tables in the dataset.

Next you need a Power Query query to load the news stories. Here’s the M code for the query:

let
//Connect to the BBC News Top Stories RSS feed
//and create a nicely formatted table
Source = Xml.Tables(Web.Contents("http://feeds.bbci.co.uk/news/rss.xml")),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Attribute:version", Int64.Type}}),
channel = #"Changed Type"{0}[channel],
#"Changed Type1" = Table.TransformColumnTypes(channel,{{"title", type text}, {"description", type text}, {"link", type text}, {"generator", type text}, {"lastBuildDate", type datetime}, {"copyright", type text}, {"language", type text}, {"ttl", Int64.Type}}),
item = #"Changed Type1"{0}[item],
#"Changed Type2" = Table.TransformColumnTypes(item,{{"title", type text}, {"description", type text}, {"link", type text}, {"pubDate", type datetime}}),
#"Removed Columns" = Table.RemoveColumns(#"Changed Type2",{"guid"}),
#"Renamed Columns" = Table.RenameColumns(#"Removed Columns",{{"title", "Title"}, {"description", "Description"}, {"link", "Link"}, {"pubDate", "Publication Date"}}),
//Find the current date and time when this query runs
CurrentDateTime = DateTimeZone.FixedUtcNow(),
//Find yesterday's date
PreviousDay = Date.AddDays(DateTime.Date(CurrentDateTime),-1),
//Put the current date and time in a new column in the table
#"Added Custom" = Table.AddColumn(#"Renamed Columns", "UTC Data Load Date", each CurrentDateTime),
#"Changed Type3" = Table.TransformColumnTypes(#"Added Custom",{{"UTC Data Load Date", type datetimezone}}),
//Add the filter required for incremental refresh
//Only return rows in this table if:
//a) The RangeStart parameter equals yesterday's date, and
//b) RangeEnd is not null (which should never be true)
#"Filtered Rows" = Table.SelectRows(#"Changed Type3", each DateTime.Date(RangeStart)=PreviousDay and RangeEnd<>null)
in
#"Filtered Rows"

As I said, the first couple of steps aren’t interesting – they just connect to the RSS feed to get the list of top stories as a table. After that:

  • The CurrentDateTime step gets the current date and time at the point when the dataset refreshes. It’s important to note that I’ve used the DateTimeZone.FixedUtcNow function – this not only gives you the current UTC date and time, but it’s guaranteed to give you the same date and time every time you call it within a query. If you use DateTimeZone.UtcNow you may get a different date time returned every time the function is called within the query, which can make things very confusing.
  • The PreviousDay step gets yesterday’s date by extracting the date from the value found in the CurrentDateTime step and subtracting one day.
  • The #”Added Custom” step adds a new column to the table called “UTC Data Load Date” containing the value returned by the CurrentDateTime step. As the name suggests, this column shows when the data in any particular row was loaded.
  • The #”Filtered Rows” step is extremely important. It filters the rows in the table: it will return all the rows in the table if the RangeStart parameter returns yesterday’s date, otherwise it will return no rows at all. It also has an AND condition that checks whether the RangeEnd parameter is not null, which should always be true. Without this step that uses RangeStart and RangeEnd in some way you won’t be able to turn on incremental refresh; the significance of only returning data for yesterday’s date will become clear later.

At this point, unless you happen to have set the value of the RangeStart parameter to yesterday’s date, the table returned by the query will be empty.

Once you have loaded this table to your Power BI dataset (I called it “News”) you need to configure incremental refresh on it. You can do this by right-clicking on the table in the Fields pane in the main Power BI window and selecting Incremental refresh from the menu. Here’s how you will need to configure the settings:

IncrementalConfig

You can ignore the warning about query folding and the M query at the top. What you need to do here is:

  • Set the “Incremental refresh” slider to On
  • Under “Store rows in the last” choose Days from the dropdown and enter the number of days you want to store data for. You probably don’t need to store data indefinitely; if you did, your dataset might get very large. Data that is older than the number of days specified here will be deleted from the dataset.
  • Under “Refresh rows in the last” again select Days from the dropdown and enter 1.
  • Do not check the boxes for “Detect data changes” and “Only refresh complete day”.

With that done all you need to do is publish to the Power BI Service and set up scheduled refresh for once a day (and no more). When the report refreshes each day, all the stories listed in the RSS feed will be added to the existing stories in the dataset. To monitor this I created a measure called Story Count to count the number of rows in the News table, and then created a simple report that showed the total value of this measure and also showed it broken down by the UTC Data Load Date column. Here’s what it looked like yesterday (the 12th of April 2020), after it had already been refreshed for a few days:

ReportYesterday

and here’s what it looked like today (the 13th of April 2020), with 51 more stories having been loaded in this morning:

ReportToday

Job done!

The last thing to point out is that connecting to the XMLA endpoint (or should I say connecting via the “Analysis Services protocol”…?) for the workspace using SQL Server Management Studio, if it’s in Premium, makes it a lot easier to understand what’s happening behind the scenes here. If you right-click on your table in SQL Server Management Studio’s Object Explorer pane and select Partitions:

PartitionsSSMS

…You can see the names of the 51 partitions created to hold the data (remember, above we opted to store 50 days of historical data), the number of rows of data in each partition and the dates that these partitions were last refreshed (or “processed” in Analysis Services terminology):

PartitionRefresh

Notice that the partition for today’s date, April 13th, is empty and that both the partition for today and yesterday (April 12th) have been refreshed today; similarly, the partition for April 11th was last refreshed yesterday on April 12th. The two most recent partitions are always refreshed and this is is why the #”Filtered Rows” step in the M code above only returns rows when the RangeStart parameter holds yesterday’s date, to make sure that the stories for the current day are only stored once.

You can also script out the table to TMSL like so:

PartitionsScript

This allows you to see the definitions of each partition and, crucially, the start and end values that are passed to the RangeStart and RangeEnd parameters when they are refreshed:

TOM

You can download my example pbix file for this post here.

%d bloggers like this: