Optimising The Performance Of Power Query Merges In Power BI, Part 2: Does When You Remove Columns Matter?

In my last post I demonstrated how the size of a table affects the performance of Power Query merge operations on non-foldable data sources in Power BI. Specifically, I showed that removing columns from the tables involved in a merge before the merge took place improved performance. But does it matter when you remove the columns? Is it enough to only select the columns you need when you expand the nested table returned by a merge, for example, or just to remove columns after the merge step? So, today’s question is:

Does it make a difference to Power Query merge performance if you remove unwanted columns from your source tables in the step before the merge or in the step afterwards?

The first problem I had to deal with when testing was that, as I said in my previous post, even when I reduced the number of columns in the million-row table I was using all my queries were exceeding the 256MB container size limit and paging was taking place. This meant that any refresh timings I took in Profiler varied by up to 1-2 seconds from run to run. When you’re comparing queries that take around 50 seconds with queries that take around 10 seconds a bit of variability in the timings from run to run doesn’t matter much. However, if you’re trying to compare two different test scenarios whose performance may not differ that much anyway it becomes more of a problem.

To deal with this I reduced the number of rows in the tables I was using in my merge to 300,000 rows. This kept container memory usage under 256MB and reduced the variability in timings to a minimum. Here are the average timings across multiple runs for the last scenario tested in my previous post – where I removed all but one column in my source tables before the merge and then counted the rows in the table returned by the merge – with these 300,000 row source tables:

  • Progress Report End/25 Execute SQL – 2.4 seconds
  • Progress Report End/17 Read Data – 0 seconds

I then changed my source queries so they again returned all seven columns and changed the merge query so that it removed all but the two A columns in a Removed Columns step after the step containing the merge, like so:


let
Source = Table.NestedJoin(First, {"A"}, Second, {"A"}, "Second", JoinKind.Inner),
#"Expanded Second" = Table.ExpandTableColumn(
Source,
"Second",
{"A", "B", "C", "D", "E", "F", "G"},
{"Second.A", "Second.B", "Second.C", "Second.D", "Second.E", "Second.F", "Second.G"}
),
#"Removed Other Columns" = Table.SelectColumns(#"Expanded Second", {"A", "Second.A"}),
#"Counted Rows" = Table.RowCount(#"Removed Other Columns")
in
#"Counted Rows"

The average timings for this version of the query were identical to those from the previous test:

  • Progress Report End/25 Execute SQL – 2.4 seconds
  • Progress Report End/17 Read Data – 0 seconds

Last of all, I removed all but column A from the source query called First, did not remove any columns from the source query called Second, and then only expanded column A in the #”Expanded Second” step in the merge query:


let
Source = Table.NestedJoin(First, {"A"}, Second, {"A"}, "Second", JoinKind.Inner),
#"Expanded Second" = Table.ExpandTableColumn(Source, "Second", {"A"}, {"Second.A"}),
#"Counted Rows" = Table.RowCount(#"Expanded Second")
in
#"Counted Rows"

Here are the average timings from this test:

  • Progress Report End/25 Execute SQL – 2.6 seconds
  • Progress Report End/17 Read Data – 0 seconds

These timings were slightly slower, but I don’t think the 200ms increase is really significant.

Therefore, from these tests, I think it’s safe to conclude that:

Removing unwanted columns in a step immediately after a merge results in the same performance benefits as removing unwanted columns in a step immediately before a merge

These results may surprise some of you. Why does removing columns after a merge perform so well? The answer is that just because you remove columns from a table in the step after a merge, it doesn’t mean that the Power Query engine actually removes the columns after it has performed the merge. On the contrary, it’s clever enough to realise that if you remove columns in the step after a merge then there’s no need for those columns to be present at the point where it performs the merge. You probably know how query folding works with certain data sources – well, it turns out that something similar to query folding happens within the Power Query engine whatever data source you’re using. The best explanation of this behaviour – which is fundamental to understanding how the Power Query engine works – can be found in the section on “Streaming Semantics” in this blog post by Ben Gribaudo in his excellent series on M:

https://bengribaudo.com/blog/2018/02/28/4391/power-query-m-primer-part5-paradigm

Of course there may be scenarios where this does not happen, or doesn’t happen efficiently, so I still recommend removing unwanted columns as early as possible in your queries just to be safe.

 

 

 

Optimising The Performance Of Power Query Merges In Power BI, Part 1: Removing Columns

Merging (or, in SQL terms, joining) tables in Power Query is a common cause of refresh performance problems. I’ve often wondered whether there’s anything you can do to optimise the performance of merges for non-foldable data source and so I decided to run some tests to try to answer all the questions I had. In this series of posts I’ll tell you what I found.

For these tests the only data source I used was a CSV file with one million rows and seven numeric columns named A, B C, D, E, F and G:

Csv

I used SQL Server Profiler to measure the amount of time taken for a query to execute using the technique I blogged about here. If you read that post (and I strongly recommend you do) you’ll see there are actually two Profiler events whose duration is significant when measuring refresh performance:

  • Progress Report End/25 Execute SQL
  • Progress Report End/17 Read Data

It also turns out that these two events provide some insight into how certain transformations are handled in the Power Query engine, but we’ll come to that later.

The first question I decided to investigate was this:

Does the number of columns in a table affect the performance of a merge?

First of all, I created two identical queries called First and Second that connected to the CSV file, used the first row from the file as the headers, and set the data types to all seven columns to Whole Number. Nothing very special, but here’s the M code all the same:

let
  Source = Csv.Document(
    File.Contents("C:\NumbersMoreColumns.csv"), 
    [Delimiter = ",", Columns = 7, 
     Encoding = 65001, QuoteStyle = QuoteStyle.None]
  ),
  #"Promoted Headers" = 
   Table.PromoteHeaders(Source, [PromoteAllScalars = true]),
  #"Changed Type" = Table.TransformColumnTypes(
    #"Promoted Headers", 
    {
      {"A", Int64.Type}, 
      {"B", Int64.Type}, 
      {"C", Int64.Type}, 
      {"D", Int64.Type}, 
      {"E", Int64.Type}, 
      {"F", Int64.Type}, 
      {"G", Int64.Type}
    }
  )
in
  #"Changed Type"

I disabled these queries so that they were not loaded into the dataset.

Next, I created a third query that used the Table.NestedJoin function to merge the data from these two queries using an inner join and return all of the columns from both source queries:

let
  Source = Table.NestedJoin(
   First, {"A"}, Second, {"A"}, 
   "Second", JoinKind.Inner),
  #"Expanded Second" = Table.ExpandTableColumn(
    Source, 
    "Second", 
    {"A", "B", "C", "D", "E", "F", "G"}, 
    {"Second.A", "Second.B", "Second.C", 
     "Second.D", "Second.E", "Second.F", "Second.G"}
  )
in
  #"Expanded Second"

MergeAllCols

When I refreshed this query, in Profiler the two events I mentioned above had the following durations:

  • Progress Report End/25 Execute SQL – 40 seconds
  • Progress Report End/17 Read Data – 56 seconds

Pretty slow. But what is performance like when you merge two tables with one column instead of seven?

To test this, I added an extra step to the First and Second queries that removed all but the A columns (the ones needed for the merge) like so:

let
  Source = Csv.Document(
    File.Contents("C:\NumbersMoreColumns.csv"), 
    [Delimiter = ",", Columns = 7, 
     Encoding = 65001, QuoteStyle = QuoteStyle.None]
  ),
  #"Promoted Headers" = 
   Table.PromoteHeaders(Source, [PromoteAllScalars = true]),
  #"Changed Type" = Table.TransformColumnTypes(
    #"Promoted Headers", 
    {
      {"A", Int64.Type}, 
      {"B", Int64.Type}, 
      {"C", Int64.Type}, 
      {"D", Int64.Type}, 
      {"E", Int64.Type}, 
      {"F", Int64.Type}, 
      {"G", Int64.Type}
    }
  ),
  #"Removed Other Columns" = 
   Table.SelectColumns(#"Changed Type", {"A"})
in
  #"Removed Other Columns"

Just A

I then updated the third query that contained the merge to reflect this change:

let
  Source = Table.NestedJoin(
   First, {"A"}, Second, {"A"}, 
   "Second", JoinKind.Inner),
  #"Expanded Second" = 
   Table.ExpandTableColumn(Source, "Second", {"A"}, {"Second.A"})
in
  #"Expanded Second"

Merge

When this query was refreshed, Profiler showed the following durations:

  • Progress Report End/25 Execute SQL – 9 seconds
  • Progress Report End/17 Read Data – 1 seconds

This query is a lot quicker, but then I thought: what if the performance is more to do with the size of the table returned by the query rather than the merge? So I added an extra step to the end of the merge query, like so:

let
  Source = Table.NestedJoin(
   First, {"A"}, Second, {"A"}, 
   "Second", JoinKind.Inner),
  #"Expanded Second" = 
   Table.ExpandTableColumn(Source, "Second", {"A"}, {"Second.A"}),
  #"Counted Rows" = Table.RowCount(#"Expanded Second")
in
  #"Counted Rows"

Rowcount

…and then reran the two tests above. My thinking was that now the merge query only returns a single value the amount of data returned by the query should not be a factor in the duration of the queries.

Here are the timings for the version with the merge on the tables with seven columns:

  • Progress Report End/25 Execute SQL – 56 seconds
  • Progress Report End/17 Read Data – 0 seconds

Here are the timings for the version with the merge on the tables with just one column:

  • Progress Report End/25 Execute SQL – 14 seconds
  • Progress Report End/17 Read Data – 0 seconds

This does seem to confirm that the number of columns in a table affects the performance of a merge, although of course it might be that it takes longer to count the rows of a table that has more columns.

This shows something else too: Read Data is instant in both cases, compared to the first two tests where it took longer than the Execute SQL events.

Why does the number of columns influence the performance of a merge? If you read my recent post on monitoring memory usage with Query Diagnostics, you’ll remember that merges have to take place in memory – so I guess the larger the tables involved in the merge, the more memory is needed and the more paging happens if the 256MB limit is exceeded. Looking at the performance counter data generated for the last two queries showed that the 256MB limit was indeed exceeded for both the last two queries above, but while the version joining the table with two columns had a maximim commit of 584MB the maximum commit for the version joining the table with seven columns was almost 3GB.

That’s enough for now. At this point I think it’s fair to say the following:

Removing any unwanted columns before you merge two tables in Power Query will improve refresh performance.

Limit The Amount Of Data You Work With In Power BI Desktop Using Parameters And Deployment Pipelines

If you’re working with large amounts of data in Power BI you may find that you have problems because:

  • Your pbix file is very large
  • You spend a long time waiting for refreshes to finish in Power BI Desktop – and if you’re developing, you may need to refresh your dataset frequently
  • It takes a long time to publish your dataset to the Power BI Service

Wouldn’t it be great if there was a way to work with a small subset of your data in Power BI Desktop and then, after you publish, load all the data when you refresh? The good news is that this is now possible with the new deployment pipelines feature in Power BI!

Assuming that you know the basics of how deployment pipelines work (the documentation is very detailed), here’s a simple example of how to do this. Let’s say that you want to use data from the FactInternetSales table in the Adventure Works DW 2017 SQL Server sample database in your dataset. When you import the data from this table and open the Advanced Editor to look at the M code for the query, here’s what you’ll see:

let
  Source = Sql.Databases("MyServerName"),
  AdventureWorksDW2017 = Source{[Name = "AdventureWorksDW2017"]}[Data],
  dbo_FactInternetSales = 
   AdventureWorksDW2017{[Schema = "dbo", Item = "FactInternetSales"]}[Data]
in
  dbo_FactInternetSales

This query, of course, imports all the data from this table. To cut it down to a smaller size, the first thing to do is to create a new Power Query parameter (called FilterRows here) of data type Decimal Number:

Parameter

Notice that the Current Value property is set to 5. The purpose of this parameter is to control the number of rows from FactInternetSales that are loaded into the dataset. Here’s an updated version of the Power Query query above that uses this parameter:

let
  Source = Sql.Databases("MyServerName"),
  AdventureWorksDW2017 = Source{[Name = "AdventureWorksDW2017"]}[Data],
  dbo_FactInternetSales = 
  AdventureWorksDW2017{[Schema = "dbo", Item = "FactInternetSales"]}[Data],
  FilterLogic = 
   if 
     FilterRows <= 0 
    then 
     dbo_FactInternetSales 
    else 
     Table.FirstN(
      dbo_FactInternetSales, 
      FilterRows
     )
in
  FilterLogic

A new step called FilterLogic has been added at the end of this query that implements the following logic:

  • If the FilterRows parameter is less than or equal to 0 then return all the rows in the FactInternetSales table, otherwise
  • If FilterRows is more than 0 then return that number of rows from the table

Given that the FilterRows parameter is set to 5, this means that the query now returns only the top 5 rows from FactInternetSales:

Top5

It’s important to point out that a filter like this will only make your refreshes faster if the Power Query engine is able to apply the filter without reading all the data in the table itself. In this case it can: with a SQL Server data source query folding ensures that the SQL query generated for the refresh only returns the top 5 rows from the FactInternetSales table:

SQL

Here’s a simple report with a card that shows the number of rows loaded into the table:

Top5Report

At this point you have your cut-down dataset for development in Power BI Desktop.

Next, publish this dataset and report to a workspace that is assigned to the Development slot in a deployment pipeline and then deploy them to the Test workspace:

Pipeline

Then click the button highlighted in the screenshot above to create a new dataset rule that changes the value of the FilterRows parameter to 0 when the dataset is deployed to the Test workspace:

ParameterRule

With this rule in place, when the dataset in the Test workspace is refreshed, the logic in the query above now ensures that all the data from the FactInternetSales table is loaded into the dataset. Instead of just 5 rows, the report now shows that the full 60000 rows of data have been loaded:

FullDataReport

 

Monitoring Power Query Memory Usage With Query Diagnostics In Power BI

In the April release of Power BI Desktop the Power Query Query Diagnostics feature was enhanced so that you can now return performance counter data. As the blog post says:

When you run performance counters, every half second Power Query will take a snapshot of resource utilization. This isn’t useful for very fast queries but can be helpful for queries that use up a lot more resources.

When might this be useful in the real world? In my last blog post I had a chart that shows the amount of data that Power Query reads from disk while loading a large-ish JSON file, created from data collected in Process Monitor using this technique. Here it is again:

SecondLoad

The x axis is relative time in seconds from when Power Query started reading the data; the y axis shows the amount of data read. Notice how data is read at a constant rate for the first 1.5 seconds, but that after the 1.5 second mark the throughput flattens? What could be causing this?

Although this does not happen consistently, and I didn’t collect the necessary data when I ran this particular test, the answer is likely to be related to how the Power Query engine uses memory.

This is a topic I have blogged about before and I strongly recommend that before carrying on you read this post on the Container Size property that can be set on dataflows in Power BI Premium. Here’s a quote from Curt Hagenlocher of the Power Query dev team from that post that is relevant here too, about how the Power Query engine uses memory when a query executes:

Certain operations force the rows of a table value to be enumerated. If the enumeration operation itself is expensive, then using Table.Buffer can be a performance optimization because we store the values in memory so that second and subsequent enumerations of the rows go against memory.

If the table is only being enumerated once (which is the most common scenario) or if the underlying enumeration is fast anyway, then Table.Buffer won’t help performance.

Table.Buffer can actually hurt performance in some cases, because we cap RAM usage of the query at 256 MB — which means that a query which uses more than 256 MB is now forced to page RAM to/from disk. Enough paging, and the performance cost can be quite dramatic.

Currently, “table at a time” operations like joins, sort, many groupings, pivot, unpivot, etc., all happen in RAM (unless folded). For large tables, these will consume a lot of memory.

Now it turns out that when Power Query reads a JSON file it has to load the whole file into memory – something that isn’t true for other types of data source. So maybe, while reading the JSON file used in my test, the slow-down in throughput was caused by paging?

With the option to collect performance counters turned on (see the announcement blog post for how to do this) I opened up the pbix file with the query from my last blog post in, went to the Power Query Editor, and collected some query diagnostics data by right-clicking on the last step in my query and selecting Diagnose:

Diagnose Step

It soon became clear that the Power Query engine has a pool of mashup containers that it reuses (yes, go and read that blog post I told you to read!) and it seems like they often need to do some garbage collection before a query runs. So, to get a nice-looking graph, I did something highly unsupported that still seemed to work: I opened Task Manager and killed all the Microsoft Mashup Evaluation Container processes I could see under Power BI. Having done this, when I collected my performance counter data I could build the following graph showing Power Query memory usage while the query was evaluating:

Memory

The x axis shows the number of seconds elapsed since the start of the query; the y axis shows the value in bytes for the Commit and Working Set performance counters. The yellow line, for the Commit (bytes) performance counter, shows the amount of virtual memory used by Power Query. The blue line, for the Working Set (bytes) performance counter, shows the amount of physical memory used by Power Query; as you can see it reaches 256MB (indicated by the red dotted line) halfway through and never exceeds that. While Commit is greater than Working Set paging must be happening and Power Query performance may suffer as a result.

Over the years I have found that one of the most effective ways of improving Power Query performance is to try to avoid transformations like those that Curt mentions above that require a large tables to be held in memory – although that’s easier said than done. At least now we have an easy way of seeing where memory might be causing problems for Power Query performance using this new feature.

Speed Up Data Refresh Performance In Power BI Desktop Using Table.View

It can sometimes be frustrating to work with slow data sources or complex Power Query queries in Power BI Desktop: you open the Power Query Editor, make some changes, click Close & Apply and then wait a loooong time for your data to refresh. In this post I’m going to show you a technique that can cut this wait by up to 50%. It involves some fairly complex M code but I promise you, the effort is worth it!

In fact, what I’m going to describe is more or less what I showed towards the end of my appearance on Guy In A Cube last year and in a few other posts, but at that time I didn’t understand properly why it worked or what the performance implications actually were. Now, thanks to a lot of help from Curt Hagenlocher of the Power Query development team I have all the details I need to blog about it.

Let’s see a simple example. Say you have a large JSON file – for this test I generated one that is 67MB containing random data using this handy online tool – and you want to load it into Power BI. You’ll end up with a query that looks something like this:

let
  Source = Json.Document(File.Contents("C:\generated.json")),
  #"Converted to Table" = Table.FromList(
    Source, 
    Splitter.SplitByNothing(), 
    null, 
    null, 
    ExtraValues.Error
  ),
  #"Expanded Column1" = Table.ExpandRecordColumn(
    #"Converted to Table", 
    "Column1", 
    {"_id", "index", "guid", "isActive", "balance"}, 
    {"_id", "index", "guid", "isActive", "balance"}
  ),
  #"Changed Type" = Table.TransformColumnTypes(
    #"Expanded Column1", 
    {
      {"_id", type text}, 
      {"index", Int64.Type}, 
      {"guid", type text}, 
      {"isActive", type text}, 
      {"balance", type text}
    }
  )
in
  #"Changed Type"

Here’s what the output of this query looks like:

Output

On my laptop this query takes around 7-8 seconds to load. Using Process Monitor to see how much data is being read from the file (see this post for more details) shows that Power Query is reading the data from this file twice. This graph, generated using data from Process Monitor, has time on the x axis and amount of data read from the file on the y axis, and the two peaks indicate that the data is being read twice:

FirstLoad

Why? When you refresh a table in Power BI Desktop two things happen:

  • First, Power BI has to check what columns are present in the table. To do this, it runs the query for the table but asks the Power Query engine to filter the table so it returns zero rows.
  • Secondly the query is run again but this time with no filter, so all rows are returned, and this is when the data is actually loaded into the table in Power BI.

It’s on this first run of the query where problems can occur. If you’re using a data source like SQL Server and you’re not doing any complex transformations then query folding will take place, so Power BI’s request to return the table but with no rows can be handled very efficiently. However, if you’re using a data source where query folding is not possible (such as Excel, CSV or JSON files) or you have complex transformations that stop folding taking place then the only way Power Query can work out what columns the query returns is by running the entire query. That’s what is happening in the example above.

This only happens in Power BI Desktop, though. When you refresh a dataset in the Power BI Service the query is only run once.

The solution is to trick the Power Query engine into running these zero-row filter queries instantly, and you can do that using the Table.View M function. I blogged about Table.View here (and I strongly suggest you read that post before carrying on) but that example didn’t quite get to the solution you need here. Here’s a new version of the query above with an extra step:

let
  Source = Json.Document(File.Contents("C:\generated.json")),
  #"Converted to Table" = Table.FromList(
    Source, 
    Splitter.SplitByNothing(), 
    null, 
    null, 
    ExtraValues.Error
  ),
  #"Expanded Column1" = Table.ExpandRecordColumn(
    #"Converted to Table", 
    "Column1", 
    {"_id", "index", "guid", "isActive", "balance"}, 
    {"_id", "index", "guid", "isActive", "balance"}
  ),
  #"Changed Type" = Table.TransformColumnTypes(
    #"Expanded Column1", 
    {
      {"_id", type text}, 
      {"index", Int64.Type}, 
      {"guid", type text}, 
      {"isActive", type text}, 
      {"balance", type text}
    }
  ),
  OverrideZeroRowFilter = Table.View(
    null, 
    [
    GetType = () => 
      type table[
      _id = Text.Type, 
      index = Int64.Type, 
      guid = Text.Type, 
      isAction = Text.Type, 
      balance = Text.Type
    ], 
    GetRows = () => 
      #"Changed Type", 
    OnTake = (count as number) => 
      if count = 0 then 
      #table(
      type table[
        _id = Text.Type, 
        index = Int64.Type, 
        guid = Text.Type, 
        isAction = Text.Type, 
        balance = Text.Type
      ], 
      {}
    ) 
    else 
     Table.FirstN(#"Changed Type", count)]
  )
in
  OverrideZeroRowFilter

The OverrideZeroRowFilter step is where the magic happens. It uses Table.View to override query folding behaviour by intercepting what happens when the table returned by the query is filtered. There are three fields in the record in the second parameter of Table.View that you need to change:

  • GetType returns a table type that describes the columns and their data types present in the output of the query. The six columns listed here are the six columns you can see in the screenshot of the query output above. It’s very easy to generate the required M code when you use the custom function that I blogged about here.
  • GetRows returns all the rows that the query can return, that’s to say the table returned by the #”Changed Type” step.
  • OnTake is used when a top n filter is applied to the table returned by the query.  In this case the code looks at the number of rows that are being requested (specified in the count parameter) and if that’s zero, it just returns an empty table with the same columns as the #”Changed Type” step; if it’s more than zero then it uses Table.FirstN to get the actual number of rows requested from #”Changed Type”. This means that when Power BI does that zero-row filter it can now happen immediately because there’s no need to go back to the data source or execute any of the transformations in the query.

This version of the query now runs in 4-5 seconds, and Process Monitor now shows that the JSON file is only read once and, obviously, reading the data once is a lot faster than reading it twice:

SecondLoad

Look at how the graph flattens at the end… but that’s something for another blog post.

There is a downside to this approach: you have to hard-code the table schema that you expect your query to return, and if you change your query to return different columns you’ll have to update the table type in the last step.

I’ve used this technique on other slow queries and it has made a massive difference to the development experience in Power BI Desktop – one query that was taking five minutes to refresh when I closed the Power Query Editor went down to two and a half minutes. If you try this out yourself please let me know if it makes a difference by leaving a comment.

[Don’t forget that there are other things you can do that will also speed up the development experience for slow queries: you should definitely turn off the “Allow data preview to download in the background” option, and you might want to consider turning off data privacy checks so long as you fully understand what the implications]

 

Power Query Formatter

If you’ve ever used DAX Formatter to format your DAX code and wondered why there isn’t an equivalent for Power Query/M, then wonder no more: the nice people at Power Pivot Insights in Germany (read their German-language blog here) have built one. You can find it here:

https://powerqueryformatter.com/

PQ Formatter

They also have an API you can use too.

Bonus information: did you know that Microsoft also has an open source parser for M, available here?

Calling The Power BI Export API From Power Automate, Part 3: Creating An Alerting Solution

In my last two posts (see here and here) I showed you how you can call the new endpoints in the Power BI REST API for exporting a report. There are some obvious, extremely useful applications for this such as emailing PDF exports of a report out to large groups of users. In this post, however, I’ll show you how you can use this for something different: creating a Power BI alerting solution.

Now I know what you’re thinking: we already have alerts in Power BI and we can also trigger a Power Automate flow when an alert fires. This functionality is a bit limited though: you can only create an alert on a dashboard, not a report, and then only on some visuals; what’s more you can only use fairly simple rules to trigger an alert. You may need something more sophisticated, but while Power Automate would seem to be the perfect tool for building an alerting solution there’s another problem: it isn’t possible to query a Power BI dataset from Power Automate… until now.

How? The new export endpoints in the Power BI REST API allow you to export a paginated report to a number of different formats, one of which is XML. This means you can write any DAX query you want, use it in a table in a paginated report, export the paginated report to XML in Power Automate and bingo – you have the output of the query in a format that Power Automate can read and do something useful with.

Let’s see an example. Take the following table of data in a Power BI dataset:

Source data

It shows sales for different countries, and let’s say that if the sales value exceeds the threshold given threshold then you want to send an email to the address given in the last column.

The following DAX query filters this table to return all the rows where sales is greater than the threshold:

EVALUATE
FILTER(
'Sales',
'Sales'[Sales]>'Sales'[Threshold]
)

Query output

It’s quite easy to create a basic paginated report in Power BI Report Builder with just a single tablix to display the output of this query:

Paginated report output

Now, let’s take the Power Automate flow that I described in my last post and alter it slightly.

First of all, instead of exporting to CSV as I did last time, you need to change the action that calls the Export To File endpoint to export the report to XML:

XML export option

More substantial changes are needed at the end of the flow, where the exported report is returned. Here’s what this part of the flow looks like at a high level after the changes:

Overview End

The first action shown here, Download the exported report file, gets the XML returned from Power BI. The SSRS documentation has a lot of detail about how a report gets rendered to XML here, but by keeping the report very basic it’s easy to understand the format of the XML. Here’s what gets returned in this case:

<?xml version="1.0" encoding="UTF-8"?>
<Report 
xmlns="AlertsPaginatedReport" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="AlertsPaginatedReport 
https://redacted" 
Name="AlertsPaginatedReport"> 
<Tablix1> 
<Details_Collection> 
<Details Country="UK" Sales="5" 
Threshold="2" Email="chris@mycompany.com" /> 
<Details Country="New Zealand" Sales="6" 
Threshold="1" Email="phil@mycompany.com" /> 
<Details Country="Netherlands" Sales="5" 
Threshold="2" Email="kasper@mycompany.com" /> 
</Details_Collection> 
</Tablix1> 
</Report>

The approach I’ve used to consume this XML in Power Automate is basically the one described in this post I found. The Compose action uses an XPath query to return an array containing one item for each row in the query results; writing the XPath query was quite painful but I finally got it working. Here’s the expression from the action:

xpath(

xml(body('Download_the_exported_report_file')),

'/*[local-name() = ''Report'']/*[local-name() = ''Tablix1'']/*[local-name() = ''Details_Collection'']/*[local-name() = ''Details'']')
Next, an Apply to each is used to iterate over each item in this array:
Apply to each

 

Finally, inside this loop, another Compose action retrieves the email address from the current iteration and this is then used to send an email:

Get email

Here’s the expression used in the Get Email Address action:

xpath(

xml(item()),

'string(/*[local-name() = ''Details'']/@Email)')
And that’s it. It’s a bit of a convoluted workaround, I admit, but it does the job; please also bear in mind the limitations of the export API listed here.
I’m sure there a lots of other things apart from alerting with the ability to consume the output of a DAX query in Power Automate, so if you have any good ideas please let me know in the comments!

Calling The Power BI Export API From Power Automate, Part 2: Creating A Flow That Exports A Paginated Report To A CSV File

In my last post I showed how to create a Power Automate custom connector for the new endpoints in the Power BI REST API for exporting a report to a file. In this post I’ll show you how to use this custom connector in a flow in Power Automate.

The three endpoints in the Power BI API that need to be called to export a report (and which have been included in the custom connector from my last post) are as follows:

  • Export To File In Group must be called first, to start the process of exporting either a Power BI report or a paginated report to a file. Exporting may take some time, though, so the exported file doesn’t get returned at this point. In the headers of the response there is a retry-after value in seconds telling you how long you must wait initially, before…
  • …calling Get Export To File Status In Group to find out whether the export is ready or not. If it isn’t ready you again need to wait the number of seconds specified in the retry-after header before checking the status again. If it is ready, you can then…
  • …call Get File Of Export To File In Group to download the export file.

This is all described in the C# example code here, so the challenge is to translate this into a flow. In my case, I want to export the following paginated report to a CSV file:

Paginated report

Here’s what my flow looks like at the top level:

High level view

To make things easy to understand I’m using a manual trigger to start the flow and hard-coding the report I want to export. The second Action calls Export To File In Group:

Start export

You will need to go into the Settings for this Action and set the Asynchronous Pattern option to Off:

AynchOff

As I said, in the response from this first API call the retry-after header tells you how many seconds you should wait before checking the export status for the first time. This value is used in the next Action, which adds the delay:

First delay

Once this delay has passed there’s a Do loop that will call the Get Export To File Status In Group endpoint up to ten times to check if the export is ready:

Do loop

This call uses the export id found in the response from the call to Export To File In Group:

Check status

If this call returns that the status is Running or NotStarted, then it will again wait for the number of seconds specified in the retry-after header; if not then there is no delay:

Second delay

The expression on the Do loop breaks the loop if the export status is Succeeded or Failed:

@or(
equals(body(‘Call_Export_Status_endpoint_to_check_if_export_is_ready’)?[‘status’], ‘Succeeded’),
equals(body(‘Call_Export_Status_endpoint_to_check_if_export_is_ready’)?[‘status’], ‘Failed’)
)

After the loop, there is a check to see if the export status is Succeeded:

Last condition

If the export status is Succeeded, then the file is downloaded and saved to OneDrive for Business:

Save file

Otherwise, the flow sends a failure notification email:

Failure email

And here’s the exported CSV file viewed in Excel Online:

Excel

And that’s it. I know this flow isn’t as robust as it could be, but I hope it gives you an idea of how to use the custom connector to export Power BI reports in Power Automate; as I said last time, I have very little Power Automate experience so please forgive any newbie mistakes! If you can see a way to improve on what I’ve done here, please leave a comment.

Automated exports from a Power BI paginated report to a CSV file is already very useful but there are other fun things we can do with this – and in my next post I’ll show you another, less obvious example.

Calling The Power BI Export API From Power Automate, Part 1: Creating A Custom Connector

Recently, a new set of endpoints in the Power BI REST API for exporting Power BI reports and paginated reports to files went into public preview (see the main announcement here and the paginated reports announcement here). Since there are all kinds of cool things you can do with this I thought I would write a few posts on how to use these endpoints in Power Automate. In this post I’ll talk about setting up a custom connector in Power Automate; in the next post I’ll talk about how to use this custom connector in Power Automate; and after that I’ll show you some less obvious uses for all this.

Before we go any further, and before you get too excited, you should read the limitations of the public preview listed here, especially those around the number of report pages that can be exported per hour and the number of pages in a report that can be exported. Also, this functionality is only available with Power BI Premium or Power BI Embedded.

Creating a Power Automate custom connector for the Power BI REST API is something that several people have blogged about already in detail. Konstantinos Ioannou has a very detailed walkthrough here; Jese Navaranjan has a video walkthrough here; and I blogged about how you could use the Swagger definition of the Power BI REST API to create a custom connector here. I’m not going to go over all these steps again but there are a few specifics that need pointing out if you want to build your custom connector manually. If you’re lazy, I exported my custom connector to a Swagger file and you can download it here; you should be able to import it and create your own custom connector very easily.  I don’t pretend to be a Power Automate expert so please excuse any newbie mistakes!

The three endpoints that you’ll need to use to export to a file (unless your report is in your My Workspace, in which case there are three other equivalent endpoints) are Export To File In Group, Get Export To File Status In Group and Get File Of Export To File In Group. There’s good documentation for regular Power BI reports here and paginated reports here, but in summary Export To File In Group starts the export process but because this might take a long time, doesn’t return the exported file; Get Export To File Status In Group allows you to check the status of the export; and Get File Of Export To File In Group returns the file once the export is ready.

Creating a custom connector in Power Automate is a four-step process and steps 1, 2 and 4 (“General”, “Security” and “Test”) are well covered in the guides above. Step 3 (“Definition”) is where you need to create three Actions for the three endpoints above.

After filling in the General and Security pages, go to the Definition page and click the New action button to create an Action for the Export To File In Group endpoint. You first need to fill in the information in the General section:

ExportFileGeneral

Then in the Request section click the Import from sample button, select POST and paste

https://api.powerbi.com/v1.0/myorg/groups/{groupId}/reports/{reportId}/ExportTo

…into the URL box and the sample payload here into the Body box:

ExportFileRequest

Setting up the other two endpoints is similar except that you need to select GET instead of POST and you don’t need to paste anything into the Body box. Here are the two templatised URLs to use for them:

Get Export To File Status In Group: https://api.powerbi.com/v1.0/myorg/groups/{groupId}/reports/{reportId}/exports/{exportId}

Get File Of Export To File In Group: https://api.powerbi.com/v1.0/myorg/groups/{groupId}/reports/{reportId}/exports/{exportId}/file

There’s more to do on this page, but at this point you should go to the Test page and test these three new Actions. For Export To File In Group you need to at least enter a groupId, a reportId and a format:

TestBox

The groupId and reportId can be found by navigating to your report in the Power BI portal; you can extract the groupId and reportId from the URL like so:

URLgroupsreports

The valid values for format (and the other parameter values) are in the docs.

Assuming that it all works, you will be able to scroll down and see the response. Copy all the JSON in the Body section and save it somewhere temporarily.

ExportId

Do the same thing for the other two Actions you have created; these two actions take a third parameter called exportId, which is the id value in the response that I’ve highlighted in the screenshot immediately above.

Once you have the Body text from Export To File In Group and Export To File Status In Group (the response from Get File Of Export To File In Group should be your report export), go back to the Definitions page and for these two Actions scroll down to the Response section and click on Add default response.

ResponseSample

There are two things to do here for both Actions. First, paste the responses you got on the Test pane into Body; next, paste:

retry-after 30

…into the Headers section and then click Import. This will make the output of the Actions much easier to consume in Power Automate later on.

Your custom connector is now ready, and in part 2 of this series I’ll show you how to use it in Power Automate.

[Thanks to Jaime Tarquino and Chris Finlan for their help getting this working]

 

Five Mistakes To Avoid When Migrating To Power BI From Another BI Platform

Now that Power BI is so popular, it’s becoming more and more common for organisations to migrate their reports from other BI platforms to Power BI. And why not? You’ll be moving to a modern, industry-leading, cloud-native BI platform and probably saving a lot of money in licensing costs too. As a Microsoft employee I wholeheartedly encourage this (obviously!) but it’s not without its pitfalls. In this post I’m going to highlight some common mistakes I’ve seen people make when migrating to Power BI so that you can avoid making them yourself. It isn’t always a simple lift-and-shift job…

#1 You may need to remodel your source data

Every report has a data source and getting source data in the right format for your BI platform is a substantial task – so much so, that you might be tempted to put Power BI on top of the data sources you have created for your previous BI platform with no changes. However different BI platforms need their data in different formats. Many BI platforms like their data munged together in one big table, sometimes even with data at different granularities in the same table. Power BI, on the other hand, likes its source data modelled as a star schema (you can find out what a star schema is and why it’s important here). If you don’t model your data as a star schema you may find that you see incorrect values in your reports, that report performance is poor, and that it’s a lot harder to write the DAX calculations that you need.

#2 Understand how Power BI works with data

Even if you have a star schema, you also have to understand that how Power BI works with data may be different to the way your previous BI tool worked. As Marco Russo and Alberto explain here, Power BI wants to you invest time up-front to create a semantic layer and once you have done this you’ll find that building reports and calculations is much quicker and easier. If you’re wondering why you can’t just write a SQL query to get the data for that chart you need to build, you’ve made this mistake.

#3 Don’t try to recreate the exact functionality of your old BI platform

If you’re building a Power BI report to replace an existing report on a legacy platform, and you ask your users what they want the report to look like, the most common reply is “Just like the old one”. This is a danger sign! One surefire way to make a Power BI report hard to build and slow to run is to try to make it do something it wasn’t designed to do.

One example of this is the way some users ask for reports that recreate the gigantic tables they are used to seeing in their old Excel reports. These tables may be slow to render in Power BI (especially if they have hundreds or thousands of rows) and what’s more, they don’t make the data easy to understand – better visualisation options are always available. My colleagues Adam and Patrick discussed this problem in detail in a recent video that’s well worth watching.

Another, more specific example, is the requirement to allow end users to change the measures or fields that are shown in a visual in a Power BI report. This is a feature we have on our public roadmap but I’ve seen many customers create tie themselves in knots trying to implement this using existing functionality; it is possible, but just because something is possible doesn’t mean you should do it. Once again, needless complexity, slower development and performance problems are the result.

Instead what you should be doing is understanding the business problem the report is trying to solve and implementing a solution using the functionality Power BI has built in. I know that users can be very stubborn about things like this, but trust me, it’s better to have the battle now rather than deal with the consequences of doing what they want you to do.

#4 Don’t forget about Analyze in Excel and Paginated Reports

Following on from the last point, if your users want to be able to explore their data by changing the measures and fields used in a visual they are probably thinking of how they use PivotTables and PivotCharts in Excel. And if that’s what they want, why don’t you let them use the real thing? Power BI’s Analyze in Excel feature doesn’t give you something that looks like an Excel PivotTable, it gives you an actual PivotTable connected to data stored in Power BI. It’s the functionality your users are comfortable with none of the downsides of traditional Excel reports such as the tedious, error-prone, manual data refreshes. What’s more you can also use Excel cube functions for more complex report layouts, such as those needed by financial reports. This video by Peter Myers is a great introduction to cube functions – they work with Power BI datasets in the same way they work with Analysis Services cubes or Power Pivot.

Similarly if, after everything I’ve just said, you still want to create reports sourced from hand-written SQL queries with gigantic tables that can be printed easily, you’ll find that Power BI Paginated Reports work much better than regular Power BI reports. Closely related to SQL Server Reporting Services, one of the most popular BI platforms in history, paginated reports have a different set of strengths compared to regular Power BI reports. Use the right tool for the job!

#5 This is your chance to change who does what

Migrating to Power BI is a chance to make a break from the old ways of doing things, and that includes who does all the work. If you’ve worked in BI for any length of time you’ll have seen plenty of examples of these two extreme approaches to BI development and the pain that goes with them:

  • Corporate BI, where all the work is done by the IT department – which quickly becomes the bottleneck, unresponsive to business needs and unable to understand them properly.
  • Excel Hell, where the business builds everything itself – and ends up with a lot of duplicated effort, multiple versions of the truth, and reports that break when the person who built them leaves for a new job.

Power BI lets you chart a course between these two. It’s easy enough to use, and affordable enough, for you to deploy it to a larger number of users in your organisation than any other BI tool except Excel. At the same time it gives you the tools you need to avoid the problems of Excel Hell: centralised data, automated refresh, security, monitoring and a lot more. This, for me, is the key to all successful Power BI deployments: empowered users and the IT department working together as a team, each doing what they do best.

%d bloggers like this: