Calling The Power BI Enhanced Refresh API From Power Automate, Part 6: Cancelling Dataset Refreshes

The thing that got me excited about the Power BI Enhanced Refresh API, and which inspired me to start this series of posts, was the fact that for the first time it gives you a way of cancelling Power BI dataset refreshes. In this last post in the series I’ll show you how you can add an Action to your Power Automate custom connector to cancel a refresh and how you can use it in a Power Automate Flow.

Adding a new Action is straightforward and very similar to what I’ve shown in previous posts in this series. As before, click on the New action button on the Definition stage of the custom connector wizard, fill in the General section with appropriate names:

…then use the Import from sample option to populate the Request section using the DELETE verb and the following URL for the Cancel Refresh In Group API endpoint:

https://api.powerbi.com/v1.0/myorg/groups/{groupId}/datasets/{datasetId}/refreshes/{refreshId}

The refreshId, which uniquely identifies the refresh operation you which to cancel can be obtained from the Action I described in part 4 of this series that can be used to kick off a refresh.

Using this new Action in a Flow is more interesting. Here’s an example Flow that kicks off a dataset refresh but cancels it if it goes on for more than 120 seconds; it’s similar to, but more complex than, the Flow I described in this post that checks the status of a refresh. At the top level it looks like this:

The first two actions kick off a full refresh of a dataset:

The next two Actions initialise two variables:

  • ElapsedRefreshDurationSeconds is an integer used to hold the amount of time in seconds that the refresh has been running for each time it is checked
  • StopCheckingRefreshStatus is a boolean variable used to determine whether the upcoming Do until loop should be terminated or not

The contents of the Do until loop look like this:

After waiting 30 seconds the status of the refresh is checked. If the status is “Unknown” then the refresh is ongoing and the next step is to check how long it has been running for; otherwise the refresh has completed at the StopCheckingRefreshStatus variable can be set to true so the Do until loop can be terminated. Drilling into the “Is refresh ongoing” conditional Action:

If the refresh is still in progress then the current elapsed duration in seconds of the refresh is calculated using the following expression (this blog post showed me how to calculate durations in Power Automate):

div(sub(ticks(utcNow()), ticks(outputs('Get_status_of_one_dataset_refresh')?['body/startTime'])),10000000)

If that elapsed duration is more than 120 seconds then the refresh should be cancelled; otherwise no action is taken. Drilling into the Yes branch of the conditional Action:

The refresh is cancelled by calling the Action created at the beginning of this post; after the cancellation the StopCheckingRefreshStatus variable is set to true. Here are details of the Actions from the Yes branch that do this:

That’s it for this post and this series. Since I started the series over a month ago a few other articles on similar subjects have been published on LinkedIn that you might want to check out, by my colleagues Romain Casteres and Rui Romano; as more enterprise BI solutions are built on Power BI, I know there will be a lot more people implementing complex refresh scenarios using Power Automate/Logic Apps and the Enhanced Refresh API.

Calling The Power BI Enhanced Refresh API From Power Automate, Part 5: Getting A Dataset’s Refresh History

In the last post in this series I showed how to get the status of an individual Power BI dataset refresh. In this post I’ll show how to get the history of a dataset’s refreshes and save it to a file in OneDrive for Business – useful if you want to do some analysis on refresh performance or the causes of failures.

This is one of the most straightforward things to do with the Power BI Enhanced Refresh API , especially once you’ve already got a custom connector created in the way I’ve shown in the previous posts in this series. You just need to add a new Action to your custom connector, fill in the General section:

…then in the Request section use Import from sample with the GET verb on the following URL (for Get Refresh History In Group):

https://api.powerbi.com/v1.0/myorg/groups/{groupId}/datasets/{datasetId}/refreshes?$top={$top}

The only thing to do after this is edit the optional $top query parameter (which allows you to limit the number of refreshes whose history you return) so it takes an integer rather than a string:

Update the custom connector and you can now use it in a simple Flow to save the JSON returned by the call to the API into a JSON file:

You can then connect to this data very easily via Power Query, for example:

Rather than trigger a Flow like this manually, in the real world you’d want to run it on a schedule, perhaps every day. You’d then end up with a folder full of JSON files you could analyse and as much history stored as you wanted. It should be possible to load and dedupe this data (multiple JSON files will contain overlapping refresh history) using Power BI dataflows and/or datamarts but that’s something I’ll leave for another blog post…

Calling The Power BI Enhanced Refresh API From Power Automate, Part 4: Getting The Status Of a Refresh

So far in this series (see part 1, part 2 and part 3) I’ve looked at how you can create a Power Automate custom connector that uses the Power BI Enhanced Refresh API to kick off a dataset refresh. That’s only half the story though: once the refresh has been started you need to know if it has finished and, if so, whether it finished successfully or not. In this post I’ll show how to do this.

When you start a refresh using the Enhanced Refresh API Power BI returns a unique identifier for that refresh operation and you will need to modify your Power Automate custom connector to make it easy to capture and use that identifier. You can do this on the Definition stage of the custom connector wizard for an Action that kicks off a refresh (ie any of the Actions I built in the previous posts in this series) by going to the Response section, clicking on the Add default response button and pasting the following sample response (from here in the docs) into the Headers box and clicking Import:

x-ms-request-id: 87f31ef7-1e3a-4006-9b0b-191693e79e9e
Location: https://api.powerbi.com/v1.0/myorg/groups/f089354e-8366-4e18-aea3-4cb4a3a50b48/datasets/cfafbeb1-8037-4d0c-896e-a46fb27ff229/refreshes/87f31ef7-1e3a-4006-9b0b-191693e79e9e

It’s the x-ms-request-id custom header that will contain the unique identifier for the refresh.

The next step is to create a new Action to check the status of the refresh using the Get Refresh Execution Details In Group API endpoint. To do this, click on the New action button on the Definition stage of the custom connector wizard and fill in the details in the General section:

Then fill in the Request section. Using Import from sample, select the verb GET, enter the following URL:

https://api.powerbi.com/v1.0/myorg/groups/{groupId}/datasets/{datasetId}/refreshes/{refreshId}

Then in the Response section click Add default response and paste in the (rather long) sample JSON response from the docs found here. The response should look like this:

You can now use this new Action in a Flow. Here’s a simple example:

The first two Actions here, “Manually trigger a flow” and “Incremental Refresh”, kick off a refresh in the way I’ve shown in the previous posts in this series. The “Do until” Action is where the interesting stuff happens:

What this does is:

  • First of all, the Delay Action waits for 30 seconds. There’s no point checking the status of a refresh immediately after it has been started, and in this case 30 seconds is a reasonable amount of time to wait. Depending on how long your refresh takes you may want to use a longer delay.
  • Then call the new Action created above to check the status of the refresh that has just been started, using the x-ms-request-id value returned by the Incremental Refresh Action.
  • If the status (returned by the Get status of one dataset refresh Action) is not Unknown then the refresh has completed and the Do until loop can be terminated. If the status is Unknown then the refresh has either not started or is still in progress so the loop should be run again, ie it will wait for another 30 seconds and then check the status again.

Finally, once the refresh has completed the Condition Action sends an email telling me the outcome: if the status is Completed then the refresh has succeeded; if it is not then the status tells you at a high-level what went wrong (you can look at the extendedStatus and the contents of the objects JSON array for more details).

Why I’m Excited About The New DAX EvaluateAndLog() Function

If you aren’t active on Twitter or LinkedIn (or you don’t spend a lot of time on them) you may not have heard about the new EvaluateAndLog() function. It’s not documented yet and not mentioned on any of the official blogs, but you can read about it in the last two posts on Jeffrey Wang’s blog here:

[If you’re wondering who Jeffrey is, he’s one of the resident geniuses that Microsoft employs on the Power BI development team – he’s one of the people Marco and Alberto go to when they need help 😀]

I think this is the most exciting thing to happen in DAX since the introduction of variables. Why? Well we all know that writing DAX can be hard, and part of why it’s hard is that it’s hard to debug. Things like Tabular Editor 3’s DAX Debugger can be really useful but the EvaluateAndLog() function is the first time the AS/Vertipaq engine has given us the ability to see the intermediate values used by a DAX expression to calculate its output without lots of tedious commenting and uncommenting or the use of ConcatenateX(). Have you ever wanted a watch window to see what’s going on inside a measure? Have you ever wanted to debug your DAX by printing out values to a console or a log file? Well that’s what EvaluateAndLog() allows you to do.

It’s still early days: Jeffrey hasn’t blogged about all of its secrets yet and the functionality needs a bit of polishing. That said, if you’re passionate about DAX I strongly suggest you check it out right now because it will make your life a lot easier.

Calling The Power BI Enhanced Refresh API From Power Automate, Part 3: Incremental Refresh Options

If you’ve read part 1 or part 2 of this series you’ll know how you can create a Power Automate custom connector to call the Power BI Enhanced Refresh API and get fine-grained control over your refreshes. In this post I will take a look at the two parameters in the Enhanced Refresh API for datasets that use incremental refresh: applyRefreshPolicy and effectiveDate. They are documented here but, as always, some worked examples are helpful to really understand how they work.

Before I carry on, I’ll assume you have created a custom connector for Power Automate that supports these two parameters. The easiest way to do this is to use the following JSON for the body when using “Import from sample” to create a new Action:

{
    "type": "Full",
    "commitMode": "transactional",
    "applyRefreshPolicy": true,
    "effectiveDate": "12/31/2013"
}

The Action should look like this in the Test stage of the custom connector wizard:

Let’s start with effectiveDate. By default, incremental refresh allows you to do things like “refresh only the last month of data” with the unstated assumption that “last month” means “last month relative to today’s date“. If the data in a table is loaded on a monthly, quarterly or yearly basis then this default behaviour of incremental refresh may lead to unexpected performance problems or data loss. The effectiveDate parameter allows you to specify a date to use instead of today’s date as the starting point for incremental refresh.

For example, I have a dataset connected to the old Adventure Works DW SQL Server sample database. If I configure incremental refresh on the FactInternetSales table as follows:

…and refresh in the Power BI Service, I can see in SQL Server Management Studio that the following partitions have been created to hold the data in the table:

As you can see I have yearly partitions created relative to today’s date. Unfortunately my old copy of the Adventure Works DW database only has data for the years 2001 to 2004, which means that after the refresh no data is loaded into the dataset. However, if I refresh from Power Automate using my custom connector and I set effectiveDate to 31st December 2004 like so:

…then the following partitions get created in the FactInternetSales table and all my data is loaded into them:

Now let’s consider applyRefreshPolicy. If you have set up incremental refresh on a table then when you do a normal scheduled refresh of your dataset only some of the data is reloaded, saving you a lot of time. However there will be occasions where you want to override this behaviour and force a full refresh of a table even though incremental refresh has been configured: for example some values in your historic data may have been updated, so the only safe way to ensure the correct data is in Power BI is to reload the entire table. I’ve seen people schedule a full refresh of their data every week or every month just in case there were any updates that they weren’t informed of. Setting applyRefreshPolicy to false (the default is true) allows you to do this.

For example, the FactInternetSales table in the dataset described above is configured to refresh only the last two years of data after the initial full refresh, which means that with an effectiveDate of 31st December 2004 only the 2004 and 2003 partitions are refreshed. However, if you set applyRefreshPolicy to false (in Power Automate false for a boolean parameter will appear as a “No”), like so:

…then all partitions in the FactInternetSales table will get refreshed.

It’s important to mention that the applyRefreshPolicy parameter only works with certain settings for the type parameter, as described here, and certain settings for the commitMode parameter, as described here.

Of course you can use these two parameters in TMSL refresh commands without using the Enhanced Refresh API or Power Automate, but I think they are particularly useful when used from a Power Automate custom connector because of the extra refresh scheduling flexibility you get with Power Automate: you may want to run a normal refresh every day but a full refresh with applyRefreshPolicy set to false once a week, for example, or you may want to only refresh one fact table out of many every quarter and set effectiveDate when you do that.

Calling The Power BI Enhanced Refresh API From Power Automate, Part 2: Refreshing Specific Tables And Partitions

In part 1 of this series I showed how you could create a very simple custom connector for Power Automate that allows you to call the Power BI Enhanced Refresh API. However, I didn’t show what I think is the main reason you’d want to build a custom connector: the ability to refresh specific tables or partitions in a dataset rather than the entire dataset. The main reason you’d want to do this is to speed up refresh performance, and there are two reasons why it can do so:

  1. Some tables in your dataset may not need refreshing every time because the data in them hasn’t changed, so not refreshing them saves you time and CPU.
  2. Refreshing the partitions in a table and increasing the maxParallelism property can also significantly speed up refresh performance, as I showed here.

This is a very common requirement and there are a lot of blog posts and videos out there on the subject but I think using a Power Automate custom connector is probably the easiest way of doing this, although it only works for datasets in a Premium or PPU capacity.

To implement an Action that does this in an existing connector, click on the New Action button and follow the instructions from my last post. Fill in the fields in the General box:

Then click on the Import from sample button, select the POST verb, enter the URL

	https://api.powerbi.com/v1.0/myorg/groups/{groupId}/datasets/{datasetId}/refreshes

…leave the Headers box blank and then, in the Body box, enter the sample Body from here:

{
    "type": "Full",
    "commitMode": "transactional",
    "maxParallelism": 2,
    "retryCount": 2,
    "objects": [
        {
            "table": "DimCustomer",
            "partition": "DimCustomer"
        },
        {
            "table": "DimDate"
        }
    ]
}

[Note that you can’t have two Actions with the same URL in the same custom connector]

There’s some extra work to do here though. Click on the dropdown on the body parameter in the Request box and click Edit:

Next, click Edit under “table” in the Parameter section:

…and edit the title and description to reflect the fact that this parameter will contain the list of tables and parameters you want to be refreshed:

Once you’ve saved the connector you can test it – but one thing that caught me out is that the Test step in the custom connector designer doesn’t actually handle the tables and partitions parameter properly (something mentioned in passing here) so you’ll want to test this in a real flow. Here’s an example that refreshes the DimDate table and two partitions (FactInternetSales2001 and FactInternetSales2003) of the FactInternetSales table:

Notice that you can add as many tables or partitions to be refresh as you want by clicking the Add new item button; if you want to refresh a table just enter the table name and leave the partition box empty, but if you want to refresh a partition you need to fill in both the table name and the partition name.

And that’s it, at least for now. Next time I’ll take a look at some of the options for datasets that use incremental refresh.

Calling The Power BI Enhanced Refresh API From Power Automate, Part 1: Creating A Basic Custom Connector

I love the new Power BI Enhanced Refresh API: it allows you to do things like refresh individual tables in your dataset, override incremental refresh policies, control the amount of parallelism, cancel refreshes and a lot more, while being easier to use than the XMLA Endpoint. However, like the XMLA Endpoint, one problem remains: how can you schedule a dataset refresh using it? One option is to create a custom connector for Power Automate (similar to what I described here for the Export API, before the Power BI export actions for Power Automate had been released): this not only allows you to schedule more complex refreshes but also gives you more flexibility over scheduling and do things like send emails if refreshes fail.

There’s no point going into the details of creating a custom connector for a Power BI API endpoint because it’s been done before, most notably by Konstantinos Ioannou who has a very detailed walkthrough here which I strongly suggest you read. There’s only one thing that has changed since he wrote that post: the Power BI App Registration Tool is now here. You also need to give the app you create the “Read and write all datasets” permission:

When you get to the Definition stage of creating the connector there are some choices to make. The Enhanced Refresh API has a lot of functionality and it could be very complicated to build a custom connector that supports everything – especially if you or your users don’t need all that functionality, or if a lot of options could confused your users. As a result it could be better to only expose a subset of the functionality – and that’s what I’ll do in this first post.

Let’s take a few basic options to start off with: the refresh type (ie do you want to do a full refresh, clear the data out of the dataset etc?), the commit mode (do you want everything you’re refreshing to be refreshed in single transaction?), the maximum amount of parallelism and the number of retries if refresh fails. Click on the New Action button and fill in the details in the General section:

Then, in the Request section, click on Import from sample and select the verb POST, enter the following URL:
https://api.powerbi.com/v1.0/myorg/groups/{groupId}/datasets/{datasetId}/refreshes

…leave the Headers box empty and then enter the following in the Body box:

{
    "type": "Full",
    "commitMode": "transactional",
    "maxParallelism": 2,
    "retryCount": 2
}

This will create an Action that allows you to set the type, commitMode, maxParallelism and retryCount options. The Request section of the Definition step should look like this:

You can finish creating the connector as per Konstantinos’s instructions after that.

Finally, to test this connector in a flow, you can build a new instant cloud flow that looks like this:

[You can find the workspace ID (the groupId for the API) and the dataset ID by going to the dataset’s Settings page in the Power BI Service and getting them from the URL as detailed here]

You will also need to turn off the Asynchronous Pattern option in the Settings dialog of the action:

You should now have a flow which can kick off a dataset refresh with a few options. This is only the beginning though: there are more options that can be added, and this flow only starts a refresh – it doesn’t tell you whether the refresh succeeded or not, or allow you to cancel a refresh, or anything else fun like that. We’ll investigate all of these things and more in future posts in this series.

Custom Queries For “Detect Data Changes” In Power BI Incremental Refresh

One feature of Power BI incremental refresh I’ve always been meaning to test out is the ability to create your own M queries to work with the “detect data changes” feature, and last week I finally had the chance to do it. The documentation is reasonably detailed but I thought it would be a good idea to show a worked example of how to use it to get direct control over what data is refreshed during an incremental refresh.

First of all I created a simple dataset with incremental refresh enabled. The source was a SQL Server table with two columns: Date (actually a datetime column) and Sales.

I then configured incremental refresh as follows:

In the background this created six yearly partitions:

Nothing interesting here so far, but the real challenge lies ahead: how exactly do you use custom queries with “detect data changes”?

I created a new table in my SQL Server database called DetectDataChangesTable with one row for every partition in the dataset (even though the incremental refresh configuration above means only the 2021 and 2022 partitions will ever be refreshed) and the values for the RangeStart and RangeEnd M parameters that would be set when each partition is refreshed:

I then created an M query in my dataset called DetectDataChangesQuery that connected to this table, filtered the RangeStart column by the current value of the RangeStart M parameter and the RangeEndColumn by the current value of the RangeEnd M parameter, and then returned just the Output column:

let
  Source = Sql.Databases(
    "ThisIsMySQLServerName"
  ),
  IncrementalRefreshDemo = Source
    {[Name = "IncrementalRefreshDemo"]}
    [Data],
  dbo_DetectDataChangesTable
    = IncrementalRefreshDemo
    {
      [
        Schema = "dbo",
        Item = "DetectDataChangesTable"
      ]
    }
    [Data],
  FilterByParams = Table.SelectRows(
    dbo_DetectDataChangesTable,
    each [RangeStart]
      = RangeStart and [RangeEnd]
      = RangeEnd
  ),
  #"Removed Other Columns"
    = Table.SelectColumns(
    FilterByParams,
    {"Output"}
  )
in
  #"Removed Other Columns"

Here’s the output of the query in the Power Query Editor with the RangeStart M parameter set to 1/1/2021 and the RangeEnd M parameter set to 1/1/2022:

The important thing to point out here is that while the documentation says the query must return a scalar value, in fact the query needs to return a table with one column and one row containing a single scalar value.

After publishing the dataset once again, then next thing to do was to set the pollingExpression property described in the documentation. I did this by connecting to the dataset via the XMLA Endpoint using Tabular Editor 3, then clicking on the Sales table and looking in the Refresh Policy section in the Properties pane. I set the property to the name of the query I just created, DetectDataChangesQuery:

I then forced a full refresh of the Sales table, including all partitions, by running a TMSL script in SQL Server Management Studio and setting the applyRefreshPolicy parameter to false, as documented here. Here’s the TMSL script:

{
  "refresh": {
    "type": "full",
	"applyRefreshPolicy": false,
    "objects": [
      {
        "database": "IncrementalRefreshDetectDataChangesTest",
        "table": "Sales"
      }
    ]
  }
}

Scripting the entire table out to TMSL I could then see the refreshBookmark property on the two partitions (2021 and 2022) which could be refreshed in an incremental refresh set to 1, the value returned for those partitions in the Output column of the DetectDataChangesQuery query:

The refreshBookmark property is important because it stores the value that Power BI compares with the output of the DetectDataChangesQuery query on subsequent dataset refreshes to determine if the partition needs to be refreshed. So, in this case, the value of refreshBookmart is 1 for the 2021 partition but if in a future refresh the DetectDataChangesQuery returns a different value for this partition then Power BI knows it needs to be refreshed.

I then went back to the DetectDataChangesTable table in SQL and set the Output column to be 2 for the row relating to the 2021 partition:

Next, went back to SQL Server Management Studio and refreshed the table using a TMSL script with applyRefreshPolicy set to true (which is the default, and what would happen if you refreshed the dataset through the Power BI portal).

{
  "refresh": {
    "type": "full",
	"applyRefreshPolicy": true,
    "objects": [
      {
        "database": "IncrementalRefreshDetectDataChangesTest",
        "table": "Sales"
      }
    ]
  }
}

In the Messages pane of the query window I saw that Power BI had detected the value returned by DetectDataChangesQuery for the 2021 partition had changed, and that therefore the partition needed to be refreshed:

Lower down in the Messages pane the output confirmed that only the 2021 partition was being refreshed:

In Profiler I saw three SQL queries. The first two were to query the DetectDataChangesTable table for the two partitions that might be refreshed to check to see if the value returned in the Output column was different:

select [_].[Output]
from [dbo].[DetectDataChangesTable] as [_]
where ([_].[RangeStart] = convert(datetime2, '2022-01-01 00:00:00') 
and [_].[RangeStart] is not null) 
and ([_].[RangeEnd] = convert(datetime2, '2023-01-01 00:00:00') 
and [_].[RangeEnd] is not null)
select [_].[Output]
from [dbo].[DetectDataChangesTable] as [_]
where ([_].[RangeStart] = convert(datetime2, '2021-01-01 00:00:00') 
and [_].[RangeStart] is not null) 
and ([_].[RangeEnd] = convert(datetime2, '2022-01-01 00:00:00') 
and [_].[RangeEnd] is not null)

The third was to get the data for the 2021 partition, which was the only partition that needed to be refreshed:

select [_].[Date],
    [_].[Sales]
from [dbo].[Sales] as [_]
where [_].[Date] >= convert(datetime2, '2021-01-01 00:00:00') 
and [_].[Date] < convert(datetime2, '2022-01-01 00:00:00')

Finally, scripting the Sales table again to TMSL after the refresh had completed showed that the refreshBookmark property had changed to 2 for the 2021 partition:

And that’s it. I really like this feature but I’ve never seen anyone use this in the real world though, which is a shame. Maybe this blog will inspire someone out there to try it in production?

[UPDATE] An extra point to add is that if you use this functionality, the data source for the partitions must be the same as the data source used for the polling queries, otherwise you’ll get an error.

Yet Another Power BI (And Synapse) Book Roundup

I like free stuff and I like books, so of course I like free books – and it seems that the more I provide free publicity for relevant books here the more free books I get sent. I’ve now got enough to merit writing another post covering those I’ve received recently from various publishers and authors. As always these are not reviews, just short summaries of books you might want to check out.

Microsoft Power BI Data Analyst Certification Guide, by Orrin Edenfield and Edward Corcoran

Studying for a certification is a great way to learn a technology and this book is intended for those studying for the PL-300 Power BI Data Analyst exam. In terms of technical content this book is a good general introduction to Power BI development and administration, so nothing out of the ordinary, but knowing that the book is written to cover the exam syllabus and the generous number of practice questions would be the reason to buy it.

Power BI for the Excel Analyst, by Wyn Hopkins

Taking a different angle on learning Power BI is Wyn Hopkins, who has aimed his book at people coming from the Excel community (Wyn himself is a well-known Excel and Power BI MVP who has a great YouTube channel). I’m surprised there aren’t more people writing content like this since the vast majority of people using Power BI come from this background; Rob Collie cornered the market years ago but has been very quiet recently. Once again it’s an introductory guide to Power BI development but there’s a healthy amount of real-world experience inside as well as opinion, which I like – it not only makes the book more valuable but also more readable.

Pro Power BI Dashboard Creation, by Adam Aspin

Adam Aspin is a prolific author of Power BI books and this one focuses on the mechanics of building reports and dashboards. It’s not one of those preachy “data visualisation” books but a guide to the Power BI report canvas, all the visuals and their properties and settings: there’s a whole chapter on drilling up and down, for example. As a result even an experienced Power BI report designer will probably find something in it that they didn’t know.

Azure Synapse Analytics Cookbook, by Gaurav Agarwal and Meenakshi Muralidharan

Not strictly a Power BI book, I know, but a general introduction to Synapse in the worked example/cookbook format – although there is a chapter on how to use Power BI with Synapse. Gaurav is a colleague of mine on the Power BI CAT team at Microsoft so of course I want to call out this new book that he has co-written! My Synapse knowledge is not as good as it should be so I learned a few things reading it.

The Rank Column Transform In Power Query Online

[This post was originally published on the official Power Query blog, which has now been taken down. I’m republishing all my posts there to this blog to ensure the content remains available.]

Yet another new transformation has been added to Power Query Online: you can now add columns containing ranks to a table. To see how to use it, consider the following table containing sales and profit values for various products:

Let’s say you want to add a new column to this table containing the rank of each product, calculated according to its sales. To do this select the Sales column so that it is highlighted (as shown in the screenshot above) and then go to the Add Column tab on the ribbon in Power Query Online and click on the “Rank column” button:

When you do this the following dialog will appear:

Click OK and the dialog will close, and you will see that a new column has been added to your table containing the ranks for each product by sales and that the table itself has been sorted by the values in the Sales column too:

In most cases this will give you the values you want, but by clicking on the Advanced radio button in the Rank dialog you’ll find more options for controlling how the rank is calculated.

First of all, notice that because Oranges and Pears have the same sales value they both have a rank of 2 and that the next product, Apples, has a rank of 4. This is because the default rank method, Standard competition, has been used. There are there are two other methods of calculating ranks though. Choosing the Dense method in the Advanced tab of the dialog from the “Rank method” dropdown:

…changes the rank value for Apples to 3:

Choosing the Ordinal rank method ignores any tied values like so:

A different way of handling tied values for sales is to consider profit values, and this is also possible on the Advanced tab. Clicking the “Add ranking” button allows you to add another column to sort by, so that when sales is tied then the product with the highest profit will get the highest rank:

Here’s the output:

Notice now that Pears has a rank of 2 and Oranges has a rank of 3 because, even though they have the same sales, Pears has a higher value for Profit.

Behind the scenes this new functionality is powered by a new M function called Table.AddRankColumn which Reza Rad has already blogged about in detail here.