Calling The Power BI Enhanced Refresh API From Power Automate, Part 3: Incremental Refresh Options

If you’ve read part 1 or part 2 of this series you’ll know how you can create a Power Automate custom connector to call the Power BI Enhanced Refresh API and get fine-grained control over your refreshes. In this post I will take a look at the two parameters in the Enhanced Refresh API for datasets that use incremental refresh: applyRefreshPolicy and effectiveDate. They are documented here but, as always, some worked examples are helpful to really understand how they work.

Before I carry on, I’ll assume you have created a custom connector for Power Automate that supports these two parameters. The easiest way to do this is to use the following JSON for the body when using “Import from sample” to create a new Action:

{
    "type": "Full",
    "commitMode": "transactional",
    "applyRefreshPolicy": true,
    "effectiveDate": "12/31/2013"
}

The Action should look like this in the Test stage of the custom connector wizard:

Let’s start with effectiveDate. By default, incremental refresh allows you to do things like “refresh only the last month of data” with the unstated assumption that “last month” means “last month relative to today’s date“. If the data in a table is loaded on a monthly, quarterly or yearly basis then this default behaviour of incremental refresh may lead to unexpected performance problems or data loss. The effectiveDate parameter allows you to specify a date to use instead of today’s date as the starting point for incremental refresh.

For example, I have a dataset connected to the old Adventure Works DW SQL Server sample database. If I configure incremental refresh on the FactInternetSales table as follows:

…and refresh in the Power BI Service, I can see in SQL Server Management Studio that the following partitions have been created to hold the data in the table:

As you can see I have yearly partitions created relative to today’s date. Unfortunately my old copy of the Adventure Works DW database only has data for the years 2001 to 2004, which means that after the refresh no data is loaded into the dataset. However, if I refresh from Power Automate using my custom connector and I set effectiveDate to 31st December 2004 like so:

…then the following partitions get created in the FactInternetSales table and all my data is loaded into them:

Now let’s consider applyRefreshPolicy. If you have set up incremental refresh on a table then when you do a normal scheduled refresh of your dataset only some of the data is reloaded, saving you a lot of time. However there will be occasions where you want to override this behaviour and force a full refresh of a table even though incremental refresh has been configured: for example some values in your historic data may have been updated, so the only safe way to ensure the correct data is in Power BI is to reload the entire table. I’ve seen people schedule a full refresh of their data every week or every month just in case there were any updates that they weren’t informed of. Setting applyRefreshPolicy to false (the default is true) allows you to do this.

For example, the FactInternetSales table in the dataset described above is configured to refresh only the last two years of data after the initial full refresh, which means that with an effectiveDate of 31st December 2004 only the 2004 and 2003 partitions are refreshed. However, if you set applyRefreshPolicy to false (in Power Automate false for a boolean parameter will appear as a “No”), like so:

…then all partitions in the FactInternetSales table will get refreshed.

It’s important to mention that the applyRefreshPolicy parameter only works with certain settings for the type parameter, as described here, and certain settings for the commitMode parameter, as described here.

Of course you can use these two parameters in TMSL refresh commands without using the Enhanced Refresh API or Power Automate, but I think they are particularly useful when used from a Power Automate custom connector because of the extra refresh scheduling flexibility you get with Power Automate: you may want to run a normal refresh every day but a full refresh with applyRefreshPolicy set to false once a week, for example, or you may want to only refresh one fact table out of many every quarter and set effectiveDate when you do that.

Calling The Power BI Enhanced Refresh API From Power Automate, Part 2: Refreshing Specific Tables And Partitions

In part 1 of this series I showed how you could create a very simple custom connector for Power Automate that allows you to call the Power BI Enhanced Refresh API. However, I didn’t show what I think is the main reason you’d want to build a custom connector: the ability to refresh specific tables or partitions in a dataset rather than the entire dataset. The main reason you’d want to do this is to speed up refresh performance, and there are two reasons why it can do so:

  1. Some tables in your dataset may not need refreshing every time because the data in them hasn’t changed, so not refreshing them saves you time and CPU.
  2. Refreshing the partitions in a table and increasing the maxParallelism property can also significantly speed up refresh performance, as I showed here.

This is a very common requirement and there are a lot of blog posts and videos out there on the subject but I think using a Power Automate custom connector is probably the easiest way of doing this, although it only works for datasets in a Premium or PPU capacity.

To implement an Action that does this in an existing connector, click on the New Action button and follow the instructions from my last post. Fill in the fields in the General box:

Then click on the Import from sample button, select the POST verb, enter the URL

	https://api.powerbi.com/v1.0/myorg/groups/{groupId}/datasets/{datasetId}/refreshes

…leave the Headers box blank and then, in the Body box, enter the sample Body from here:

{
    "type": "Full",
    "commitMode": "transactional",
    "maxParallelism": 2,
    "retryCount": 2,
    "objects": [
        {
            "table": "DimCustomer",
            "partition": "DimCustomer"
        },
        {
            "table": "DimDate"
        }
    ]
}

[Note that you can’t have two Actions with the same URL in the same custom connector]

There’s some extra work to do here though. Click on the dropdown on the body parameter in the Request box and click Edit:

Next, click Edit under “table” in the Parameter section:

…and edit the title and description to reflect the fact that this parameter will contain the list of tables and parameters you want to be refreshed:

Once you’ve saved the connector you can test it – but one thing that caught me out is that the Test step in the custom connector designer doesn’t actually handle the tables and partitions parameter properly (something mentioned in passing here) so you’ll want to test this in a real flow. Here’s an example that refreshes the DimDate table and two partitions (FactInternetSales2001 and FactInternetSales2003) of the FactInternetSales table:

Notice that you can add as many tables or partitions to be refresh as you want by clicking the Add new item button; if you want to refresh a table just enter the table name and leave the partition box empty, but if you want to refresh a partition you need to fill in both the table name and the partition name.

And that’s it, at least for now. Next time I’ll take a look at some of the options for datasets that use incremental refresh.

Calling The Power BI Enhanced Refresh API From Power Automate, Part 1: Creating A Basic Custom Connector

I love the new Power BI Enhanced Refresh API: it allows you to do things like refresh individual tables in your dataset, override incremental refresh policies, control the amount of parallelism, cancel refreshes and a lot more, while being easier to use than the XMLA Endpoint. However, like the XMLA Endpoint, one problem remains: how can you schedule a dataset refresh using it? One option is to create a custom connector for Power Automate (similar to what I described here for the Export API, before the Power BI export actions for Power Automate had been released): this not only allows you to schedule more complex refreshes but also gives you more flexibility over scheduling and do things like send emails if refreshes fail.

There’s no point going into the details of creating a custom connector for a Power BI API endpoint because it’s been done before, most notably by Konstantinos Ioannou who has a very detailed walkthrough here which I strongly suggest you read. There’s only one thing that has changed since he wrote that post: the Power BI App Registration Tool is now here. You also need to give the app you create the “Read and write all datasets” permission:

When you get to the Definition stage of creating the connector there are some choices to make. The Enhanced Refresh API has a lot of functionality and it could be very complicated to build a custom connector that supports everything – especially if you or your users don’t need all that functionality, or if a lot of options could confused your users. As a result it could be better to only expose a subset of the functionality – and that’s what I’ll do in this first post.

Let’s take a few basic options to start off with: the refresh type (ie do you want to do a full refresh, clear the data out of the dataset etc?), the commit mode (do you want everything you’re refreshing to be refreshed in single transaction?), the maximum amount of parallelism and the number of retries if refresh fails. Click on the New Action button and fill in the details in the General section:

Then, in the Request section, click on Import from sample and select the verb POST, enter the following URL:
https://api.powerbi.com/v1.0/myorg/groups/{groupId}/datasets/{datasetId}/refreshes

…leave the Headers box empty and then enter the following in the Body box:

{
    "type": "Full",
    "commitMode": "transactional",
    "maxParallelism": 2,
    "retryCount": 2
}

This will create an Action that allows you to set the type, commitMode, maxParallelism and retryCount options. The Request section of the Definition step should look like this:

You can finish creating the connector as per Konstantinos’s instructions after that.

Finally, to test this connector in a flow, you can build a new instant cloud flow that looks like this:

[You can find the workspace ID (the groupId for the API) and the dataset ID by going to the dataset’s Settings page in the Power BI Service and getting them from the URL as detailed here]

You will also need to turn off the Asynchronous Pattern option in the Settings dialog of the action:

You should now have a flow which can kick off a dataset refresh with a few options. This is only the beginning though: there are more options that can be added, and this flow only starts a refresh – it doesn’t tell you whether the refresh succeeded or not, or allow you to cancel a refresh, or anything else fun like that. We’ll investigate all of these things and more in future posts in this series.

Custom Queries For “Detect Data Changes” In Power BI Incremental Refresh

One feature of Power BI incremental refresh I’ve always been meaning to test out is the ability to create your own M queries to work with the “detect data changes” feature, and last week I finally had the chance to do it. The documentation is reasonably detailed but I thought it would be a good idea to show a worked example of how to use it to get direct control over what data is refreshed during an incremental refresh.

First of all I created a simple dataset with incremental refresh enabled. The source was a SQL Server table with two columns: Date (actually a datetime column) and Sales.

I then configured incremental refresh as follows:

In the background this created six yearly partitions:

Nothing interesting here so far, but the real challenge lies ahead: how exactly do you use custom queries with “detect data changes”?

I created a new table in my SQL Server database called DetectDataChangesTable with one row for every partition in the dataset (even though the incremental refresh configuration above means only the 2021 and 2022 partitions will ever be refreshed) and the values for the RangeStart and RangeEnd M parameters that would be set when each partition is refreshed:

I then created an M query in my dataset called DetectDataChangesQuery that connected to this table, filtered the RangeStart column by the current value of the RangeStart M parameter and the RangeEndColumn by the current value of the RangeEnd M parameter, and then returned just the Output column:

let
  Source = Sql.Databases(
    "ThisIsMySQLServerName"
  ),
  IncrementalRefreshDemo = Source
    {[Name = "IncrementalRefreshDemo"]}
    [Data],
  dbo_DetectDataChangesTable
    = IncrementalRefreshDemo
    {
      [
        Schema = "dbo",
        Item = "DetectDataChangesTable"
      ]
    }
    [Data],
  FilterByParams = Table.SelectRows(
    dbo_DetectDataChangesTable,
    each [RangeStart]
      = RangeStart and [RangeEnd]
      = RangeEnd
  ),
  #"Removed Other Columns"
    = Table.SelectColumns(
    FilterByParams,
    {"Output"}
  )
in
  #"Removed Other Columns"

Here’s the output of the query in the Power Query Editor with the RangeStart M parameter set to 1/1/2021 and the RangeEnd M parameter set to 1/1/2022:

The important thing to point out here is that while the documentation says the query must return a scalar value, in fact the query needs to return a table with one column and one row containing a single scalar value.

After publishing the dataset once again, then next thing to do was to set the pollingExpression property described in the documentation. I did this by connecting to the dataset via the XMLA Endpoint using Tabular Editor 3, then clicking on the Sales table and looking in the Refresh Policy section in the Properties pane. I set the property to the name of the query I just created, DetectDataChangesQuery:

I then forced a full refresh of the Sales table, including all partitions, by running a TMSL script in SQL Server Management Studio and setting the applyRefreshPolicy parameter to false, as documented here. Here’s the TMSL script:

{
  "refresh": {
    "type": "full",
	"applyRefreshPolicy": false,
    "objects": [
      {
        "database": "IncrementalRefreshDetectDataChangesTest",
        "table": "Sales"
      }
    ]
  }
}

Scripting the entire table out to TMSL I could then see the refreshBookmark property on the two partitions (2021 and 2022) which could be refreshed in an incremental refresh set to 1, the value returned for those partitions in the Output column of the DetectDataChangesQuery query:

The refreshBookmark property is important because it stores the value that Power BI compares with the output of the DetectDataChangesQuery query on subsequent dataset refreshes to determine if the partition needs to be refreshed. So, in this case, the value of refreshBookmart is 1 for the 2021 partition but if in a future refresh the DetectDataChangesQuery returns a different value for this partition then Power BI knows it needs to be refreshed.

I then went back to the DetectDataChangesTable table in SQL and set the Output column to be 2 for the row relating to the 2021 partition:

Next, went back to SQL Server Management Studio and refreshed the table using a TMSL script with applyRefreshPolicy set to true (which is the default, and what would happen if you refreshed the dataset through the Power BI portal).

{
  "refresh": {
    "type": "full",
	"applyRefreshPolicy": true,
    "objects": [
      {
        "database": "IncrementalRefreshDetectDataChangesTest",
        "table": "Sales"
      }
    ]
  }
}

In the Messages pane of the query window I saw that Power BI had detected the value returned by DetectDataChangesQuery for the 2021 partition had changed, and that therefore the partition needed to be refreshed:

Lower down in the Messages pane the output confirmed that only the 2021 partition was being refreshed:

In Profiler I saw three SQL queries. The first two were to query the DetectDataChangesTable table for the two partitions that might be refreshed to check to see if the value returned in the Output column was different:

select [_].[Output]
from [dbo].[DetectDataChangesTable] as [_]
where ([_].[RangeStart] = convert(datetime2, '2022-01-01 00:00:00') 
and [_].[RangeStart] is not null) 
and ([_].[RangeEnd] = convert(datetime2, '2023-01-01 00:00:00') 
and [_].[RangeEnd] is not null)
select [_].[Output]
from [dbo].[DetectDataChangesTable] as [_]
where ([_].[RangeStart] = convert(datetime2, '2021-01-01 00:00:00') 
and [_].[RangeStart] is not null) 
and ([_].[RangeEnd] = convert(datetime2, '2022-01-01 00:00:00') 
and [_].[RangeEnd] is not null)

The third was to get the data for the 2021 partition, which was the only partition that needed to be refreshed:

select [_].[Date],
    [_].[Sales]
from [dbo].[Sales] as [_]
where [_].[Date] >= convert(datetime2, '2021-01-01 00:00:00') 
and [_].[Date] < convert(datetime2, '2022-01-01 00:00:00')

Finally, scripting the Sales table again to TMSL after the refresh had completed showed that the refreshBookmark property had changed to 2 for the 2021 partition:

And that’s it. I really like this feature but I’ve never seen anyone use this in the real world though, which is a shame. Maybe this blog will inspire someone out there to try it in production?

[UPDATE] An extra point to add is that if you use this functionality, the data source for the partitions must be the same as the data source used for the polling queries, otherwise you’ll get an error.

Yet Another Power BI (And Synapse) Book Roundup

I like free stuff and I like books, so of course I like free books – and it seems that the more I provide free publicity for relevant books here the more free books I get sent. I’ve now got enough to merit writing another post covering those I’ve received recently from various publishers and authors. As always these are not reviews, just short summaries of books you might want to check out.

Microsoft Power BI Data Analyst Certification Guide, by Orrin Edenfield and Edward Corcoran

Studying for a certification is a great way to learn a technology and this book is intended for those studying for the PL-300 Power BI Data Analyst exam. In terms of technical content this book is a good general introduction to Power BI development and administration, so nothing out of the ordinary, but knowing that the book is written to cover the exam syllabus and the generous number of practice questions would be the reason to buy it.

Power BI for the Excel Analyst, by Wyn Hopkins

Taking a different angle on learning Power BI is Wyn Hopkins, who has aimed his book at people coming from the Excel community (Wyn himself is a well-known Excel and Power BI MVP who has a great YouTube channel). I’m surprised there aren’t more people writing content like this since the vast majority of people using Power BI come from this background; Rob Collie cornered the market years ago but has been very quiet recently. Once again it’s an introductory guide to Power BI development but there’s a healthy amount of real-world experience inside as well as opinion, which I like – it not only makes the book more valuable but also more readable.

Pro Power BI Dashboard Creation, by Adam Aspin

Adam Aspin is a prolific author of Power BI books and this one focuses on the mechanics of building reports and dashboards. It’s not one of those preachy “data visualisation” books but a guide to the Power BI report canvas, all the visuals and their properties and settings: there’s a whole chapter on drilling up and down, for example. As a result even an experienced Power BI report designer will probably find something in it that they didn’t know.

Azure Synapse Analytics Cookbook, by Gaurav Agarwal and Meenakshi Muralidharan

Not strictly a Power BI book, I know, but a general introduction to Synapse in the worked example/cookbook format – although there is a chapter on how to use Power BI with Synapse. Gaurav is a colleague of mine on the Power BI CAT team at Microsoft so of course I want to call out this new book that he has co-written! My Synapse knowledge is not as good as it should be so I learned a few things reading it.

The Rank Column Transform In Power Query Online

[This post was originally published on the official Power Query blog, which has now been taken down. I’m republishing all my posts there to this blog to ensure the content remains available.]

Yet another new transformation has been added to Power Query Online: you can now add columns containing ranks to a table. To see how to use it, consider the following table containing sales and profit values for various products:

Let’s say you want to add a new column to this table containing the rank of each product, calculated according to its sales. To do this select the Sales column so that it is highlighted (as shown in the screenshot above) and then go to the Add Column tab on the ribbon in Power Query Online and click on the “Rank column” button:

When you do this the following dialog will appear:

Click OK and the dialog will close, and you will see that a new column has been added to your table containing the ranks for each product by sales and that the table itself has been sorted by the values in the Sales column too:

In most cases this will give you the values you want, but by clicking on the Advanced radio button in the Rank dialog you’ll find more options for controlling how the rank is calculated.

First of all, notice that because Oranges and Pears have the same sales value they both have a rank of 2 and that the next product, Apples, has a rank of 4. This is because the default rank method, Standard competition, has been used. There are there are two other methods of calculating ranks though. Choosing the Dense method in the Advanced tab of the dialog from the “Rank method” dropdown:

…changes the rank value for Apples to 3:

Choosing the Ordinal rank method ignores any tied values like so:

A different way of handling tied values for sales is to consider profit values, and this is also possible on the Advanced tab. Clicking the “Add ranking” button allows you to add another column to sort by, so that when sales is tied then the product with the highest profit will get the highest rank:

Here’s the output:

Notice now that Pears has a rank of 2 and Oranges has a rank of 3 because, even though they have the same sales, Pears has a higher value for Profit.

Behind the scenes this new functionality is powered by a new M function called Table.AddRankColumn which Reza Rad has already blogged about in detail here.

Power BI DirectQuery Best Practices For Snowflake And Other Databases

Recently I collaborated with a number of people from Microsoft and Snowflake to write a blog post on best practices for using Power BI in DirectQuery mode on Snowflake. You can read it here:

https://medium.com/@jerellegainey/best-practices-for-using-power-bi-in-directquery-mode-with-snowflake-bfd1312ca7ab

It builds on what is already in the Power BI guidance documentation for DirectQuery to add some advice specific to Snowflake. It also has a few other tips that are generally applicable to all DirectQuery sources and which aren’t in the guidance docs (yet), such as the importance of setting the Maximum Connections Per Data Source property (which I also blogged about recently here) and the fact you can increase this to 30 in a Premium capacity, as well as the importance of always testing DirectQuery performance in the Power BI Service rather than in Power BI Desktop. As a result it’s worth reading if you are thinking of using Power BI in DirectQuery mode with Synapse, BigQuery or any other source.

If you are considering using DirectQuery on a project I have one last piece of advice: think very carefully about why you need to use DirectQuery and not Import mode. Many people don’t and end up in trouble – in my opinion Import mode should be the default choice simply because it will almost always perform better than DirectQuery mode, whatever back-end database you’re using.

Gateways And Measuring Power Query CPU Usage During Power BI Dataset Refresh

After last week’s post on measuring Power Query CPU usage during dataset refresh, someone asked an obvious question that I should have addressed: does using a gateway change anything? After all, if you’re using a gateway to connect to an on-premises data source then all the Power Query queries transforming the data from that source will be executed on the gateway machine and not in the Power BI Service.

Let’s do a quick test to find out. I couldn’t use the same Power Query query I used in last week’s post (it turns out you can’t force the use of a gateway when there isn’t an external data source) so instead I used another dataset that connects to a large CSV stored in ADLSgen2 storage and does a group by operation – something which is guaranteed to be very expensive in terms of CPU for Power Query.

Here’s what Profiler shows for the refresh operation when no gateway is used:

The refresh took around 30 seconds and used around 44 seconds of CPU time.

Here’s what Profiler shows when the refresh does use a gateway:

The refresh takes a lot longer, around 103 seconds (as you would expect – instead of loading the data from ADLSgen2 storage in the cloud to the Power BI Service, it has to take a round trip via the gateway on my PC) but the important thing is that the CPU time is now very low – 141 milliseconds.

So, as you might expect, the CPU time for refreshes that use an on-premises data gateway is not shown in Profiler traces because, as I said, all the work done by the Power Query engine is done on the gateway machine and not in the Power BI Service. Making refreshes use a gateway, even when you don’t need to, can be a way of taking load off a Power BI Premium capacity if it’s overloaded.

This in turn raises the question of how you measure Power Query CPU usage on a gateway? As far as I know it isn’t possible for individual Power Query queries (I could be wrong though), although the gateway logs do allow you to capture CPU usage for the whole machine. Better gateway monitoring tools are on the way but this seems like a good time to mention my colleague Rui Romano’s open source gateway monitoring solution (article | repo) which makes understanding the gateway logs a lot easier.

Measuring Power Query CPU Usage During Power BI Dataset Refresh

Some time ago I wrote a post about how optimising for CPU Time is almost as important as optimising for Duration in Power BI, especially if you’re working with Power BI Premium Gen2. This is fairly straightforward if you’re optimising DAX queries or optimising Analysis Services engine-related activity for refreshes. But what about Power Query-related activity? You may have a small dataset but if you’re doing a lot of complex transformations in Power Query that could end up using a lot of CPU, even once the CPU smoothing for background activity that happens with Premium Gen2 has happened. How can you measure how expensive your Power Query queries are in terms of CPU? In this post I’ll show you how.

Let’s consider two Power Query queries that return a similar result and which are connected to two different tables in the same Power BI dataset. The first query returns a table with one column and one row, where the only value is a random number returned by the Number.Random M function:

#table(type table [A=number],{{Number.Random()}})

The second query also returns a table with a single value in it:

let
  InitialList = {1 .. 1000000},
  RandomNumbers = List.Transform(
    InitialList,
    each Number.Random()
  ),
  FindMin = List.Min(RandomNumbers),
  Output = #table(
    type table [A = number],
    {{FindMin}}
  )
in
  Output

This second query, however, generates one million random numbers, finds the minimum and returns that value – which of course is a lot slower and more expensive in terms of CPU.

If you run a SQL Server Profiler trace connected to Power BI Desktop and refresh each of the two tables in the dataset separately, the Command End event for the refresh will tell you the duration of the refresh and also the amount of CPU Time used by the Analysis Services engine for the refresh (there will be several Command End events visible in Profiler but only one with any significant activity, so it will be easy to spot the right one). In Desktop, however, the Command End event does not include any CPU used by the Power Query Engine. Here’s what the Command End event for the first Power Query query above looks like in Desktop:

As you would expect the values in both the Duration and CPU Time columns are low. Here is what the Command End event looks like for the second query above:

This time the refresh is much slower (the Duration value is much larger than before) but the CPU Time value is still low, because the Analysis Services engine is still only receiving a table with a single value in it. All the time taken by the refresh is taken in the Power Query engine.

If you publish a dataset containing these queries to a Premium workspace in the Power BI Service, connect Profiler to the XMLA Endpoint for the workspace, and then refresh the two tables again then for the first, fast query you won’t notice much difference:

[Note that in this screenshot I’ve chosen a comparable Command End event to the one I used in Desktop, although for some reason it doesn’t show the duration. The overall refresh duration, which includes some extra work to do a backup, is around 2 seconds]

However, for the second, slower query you can see that the CPU Time for the Command End event is much higher. This is because in the Power BI Service the event’s CPU Time includes all the Power Query-related activity as well as all Analysis Services engine activity:

This is a simple example where there is very little work being done in the Analysis Services engine, which means that pretty much all the CPU Time can be attributed to the Power Query engine. In the real world, when you’re working with large amount of data, it will be harder to understand how much work is being done in the Analysis Services engine and how much is being done in the Power Query engine. This is where Power BI Desktop comes in, I think. In Desktop you know you are only seeing the CPU used by the Analysis Services engine, so I’ll bet that if there is a big difference in the ratio of CPU Time to Duration for your refresh in Power BI Desktop compared to the Power BI Service, it’s highly likely that that difference is due to Power Query engine activity and that’s where you should concentrate your optimisation efforts.

Of course the next question is how can you optimise Power Query queries so they use less CPU? I don’t know, I haven’t done it yet – but when I have something useful to share I’ll blog about it…

Monitoring Power Query Online Memory And CPU Usage

Power Query Online is, as the name suggests, the online version of Power Query – it’s what you use when you’re developing Power BI Dataflows for example. Sometimes when you’re building a complex, slow query in the Query Editor you’ll notice a message in the status bar at the bottom of the page telling you how long the query has been running for and how much memory and CPU it’s using:

The duration and CPU values are straightforward, but what does the memory value actually represent? It turns out it’s the “Commit (Bytes)” value documented here for Query Diagnostics, that’s to say the amount of virtual memory being used by the query. That’s different to the “Working Set (Bytes)” value which is the amount of physical memory used by the query, and which is not visible anywhere. For a more detailed discussion of these values in Power Query in Power BI Desktop see this post. The maximum commit or working set for a query evalation in Power Query Online isn’t officially documented anywhere (and may change) but I can say three things:

  1. The maximum commit is larger than the maximimum working set.
  2. If Power Query Online uses more than the maximum working set then query evaluation will get slow, so if your query uses a lot of memory (say, over 1GB – I suspect you’ll only see this message if it is using a lot of memory…) then you need to do some tuning to reduce it. Probably the best way to do this is to look at the query plan for your dataflow and try to avoid any operations marked as “Full Scan”, as documented here.
  3. If your query uses more than the maximum commit then it may get cancelled and you’ll see an error (note that the maximum time a query evaluation can run for in Power Query Online anyway is 10 minutes, which is documented here).

[Thanks to Jorge Gomez Basanta for this information]