Removing Diacritics From Text In Power Query

I faced an interesting challenge recently: I had to use Power Query to remove all non-ASCII characters from some text but maintain the readability of that text. Let’s take the French phrase “un garçon très âgé à Noël” (which I know doesn’t make much sense but is great for testing) as an example. If you remove everything apart from the basic latin alphabet characters and spaces using Text.Select as I blogged here:

Text.Select(
"un garçon très âgé à Noël", 
List.Combine(
{{"A".."Z"},{"a".."z"},{" "}}
)
)

[Rick de Groot has a very detailed post on Text.Select here that’s worth reading]

…then the output is “un garon trs g Nol”. As you can see, removing all the characters leads to unreadable text. Instead, what you have to do is find all the letters with diacritics (accents and other glyphs that can be added to characters) and remove the diacritics. Doing this may be ungrammatical and make it harder to understand the meaning of the text but in most cases the text will still be readable.

The bad news is that there is no straightforward way to do this in Power Query, and indeed there is no straightforward way to do this at all because there are no hard-and-fast rules about what to replace a letter with a diacritic with: should “ö” become “o” or “oe” for example? My first thought was to create a big lookup table with all the rules of what to replace each character with in, similar to the approach taken here for solving this problem in Google Sheets. Building a comprehensive lookup table would be gigantic task though.

Luckily there is another solution (thanks to Curt Hagenlocher for finding this one for me) that is described in this thread on StackExchange: converting the text to binary using the Greek (ISO) code page and converting it back to text again using the Text.FromBinary and Text.ToBinary M functions. I’m not sure I understand why it works but it seems to work well enough for my purpose. Here’s the M code to do this:

Text.FromBinary(
Text.ToBinary(
"un garçon très âgé à Noël", 
28597
)
)

The output of this is “un garcon tres age a Noel”, which is exactly what we want. Of course it’s not perfect and there are cases where characters can’t be converted. If you take the text “Malus × zumi” (it’s a type of crabapple apparently), then:

Text.FromBinary(
Text.ToBinary(
"Malus × zumi", 
28597
)
)

returns the text “Malus ? zumi” – the “×” has been replaced by “?”. As a result you’ll probably also want to replace any question marks with a space and then remove any excess spaces; I’m going to ignore the fact that doing this might also remove any question marks that were in the original text.

Here’s an M custom function that does all this and also solves my original problem of only returning ASCII characters:

(inputText as text) as text =>
  let
    ReplaceDiacritics = Text.FromBinary(Text.ToBinary(inputText, 28597), TextEncoding.Ascii), 
    RemoveQuestionMarks = Text.Replace(ReplaceDiacritics, "?", " "), 
    RemoveExcessSpaces = Text.Combine(
      List.RemoveItems(Text.Split(Text.Trim(RemoveQuestionMarks), " "), {""}), 
      " "
    )
  in
    RemoveExcessSpaces

Dynamic What-If With User-Entered Data In Power BI

My recent post on passing any value to a Power BI dynamic M parameter from a report generated quite a lot of interest but only among people who use DirectQuery mode and who want more flexibility when filtering. In this post, however, I’ll show you an application for this technique that is useful for anyone, regardless of whether they are using DirectQuery mode or Import mode, and which is nothing to do with filtering.

You’re probably familiar with Power BI’s what-if parameter feature which allows end users to pass values into calculations. It’s great but it has one big limitation: all the values that can be used need to be hard-coded in a calculated table at design time. What if (haha) you could let your users enter the values they wanted to use in the report instead? Let’s see a simple example of how to do this.

Here’s an Import mode fact table containing sales values for various products:

The aim is to allow end users to enter one or more percentage values in a report and have these used to calculate growth forecasts on the Sales Amount value.

To do this you need to use the technique from the blog post referenced above to let the users enter the percentage values. First, create an M parameter of type Decimal Number (called EnteredValueParam) in the Power Query Editor:

Then create a dummy query (in this case called EnteredValues) with no rows and a single numeric column to bind the parameter to. Here’s the M code for this:

#table(
  type table [EnteredValues = number],
  {}
)

Now comes the fun part. Rather than use these entered values to filter a SQL query, they will be used to generate a SQL statement that returns the same values in the form of a disconnected, DirectQuery table connected to a relational database (SQL Server in this case). Here’s the M code for a query, in this example called ReturnedValues, that does this:

let
  //Check if the M parameter returns a list                                         
  //If it doesn't, turn it into a list                                    
  FilterList =
    if Type.Is(
      Value.Type(EnteredValueParam),
      List.Type
    )
    then
      EnteredValueParam
    else
      {EnteredValueParam},
  //Generate all the individual SELECT statements                                               
  GenerateSQLQueries = List.Transform(
    FilterList,
    each "SELECT '"
      & Text.From(_)
      & "%' AS Forecast, "
      & Text.From(_)
      & " AS Pct"
  ),
  //Concatenate all the SELECT statements with UNION ALL into a single statement                                                                              
  UnionAll = Text.Combine(
    GenerateSQLQueries,
    " UNION ALL "
  ),
  //Connect to the SQL Server database and run the query                                                      
  Source = Sql.Database(
    "localhost",
    "AdventureWorksDW2017"
  ),
  RunQuery = Value.NativeQuery(
    Source,
    UnionAll
  )
in
  RunQuery

If the parameter EnteredValueParam is passed the single value 1, the ReturnedValue query generates the following SQL query and runs it:

SELECT '1%' AS Forecast, 1 AS Pct

Here’s what the ReturnedValue query returns:

If the parameter EnteredValueParam returns the list {1,25,50,75}, the ReturnedValue query generates the following SQL query and runs it:

SELECT '1%' AS Forecast, 1 AS Pct 
UNION ALL 
SELECT '25%' AS Forecast, 25 AS Pct 
UNION ALL 
SELECT '50%' AS Forecast, 50 AS Pct 
UNION ALL 
SELECT '75%' AS Forecast, 75 AS Pct

Here’s what the ReturnedValue query returns in this case:

At this point, with all the queries loaded, the dataset looks like this:

The ReturnedValues table is in DirectQuery mode; the Sales table is in Import mode; the Entered Values table is in Import mode and the EnteredValues column on that table is bound to the EnteredValueParam M parameter, which has the Multi-select property turned on.

With all this in place you can create a measure on the Sales table which takes the sum of Sales Amount and applies a percentage growth based on the selected value in the Pct column of the ReturnedValues table:

Sales Amount Forecasts =
SUM ( Sales[SalesAmount] )
    * (
        ( SELECTEDVALUE ( ReturnedValues[Pct] ) / 100 ) + 1
    )

Finally, here’s a report page that uses the Filter By List custom visual to allow end users to enter the percentage forecast values and has a matrix to display the measure above with the Product field from the Sales table on rows and the Forecast field from the ReturnedValues table on columns:

Here it is in action:

It’s a bit of a pain that you need a DirectQuery data source in order to be able to do this but it’s not too difficult to set one up and it won’t need much in the way of resources. I think this technique (which I hereby name “Dynamic DirectQuery Disconnected Dimension Tables”) could be useful in other scenarios too, such as ABC analysis. If you can think of other ways to use this let me know by leaving a comment below!

The “Block length does not match with its complement” Error In Power Query In Power BI And Excel

When using the Web.Contents function to call a web service in Power Query in either Power BI or Excel you may encounter the following error:

DataFormat.Error: Block length does not match with its complement.

Here’s an example of an M query that calls a web service and, at the time of writing, gives this error:

Web.Contents(
  "https://api.open-meteo.com/v1/forecast?latitude=52.52&longitude=13.41&hourly=temperature_2m"
)

The problem is caused by something to do with how some web services handle “deflate” compression; apparently the same issue exists in the equivalent native .NET functionality. There are two ways to avoid it though.

First of all, the easy way: ask the web service to use “gzip” compression instead. You can do this by setting the “Accept-Encoding” header to “gzip” like so:

Web.Contents(
  "https://api.open-meteo.com/v1/forecast?latitude=52.52&longitude=13.41&hourly=temperature_2m",
  [
    Headers = [
      #"Accept-Encoding" = "gzip"
    ]
  ]
)

Having done this the error will no longer occur and you can go on to handle the data returned by the web service (which is a JSON document) as normal.

The second is to use the Web.BrowserContents function instead of Web.Contents:

Web.BrowserContents(
  "https://api.open-meteo.com/v1/forecast?latitude=52.52&longitude=13.41&hourly=temperature_2m"
)

This is more complicated because the Web.BrowserContents function returns the HTML for a web page as viewed by a web browser. Apart from issues like handling authentication you’ll need to parse the result to get the data you need:

It’s doable using the Html.Table function though. Here’s an example of how to handle the response for this particular web service call:

let
  Source = Web.BrowserContents(
    "https://api.open-meteo.com/v1/forecast?latitude=52.52&longitude=13.41&hourly=temperature_2m"
  ),
  HTMLTable = Html.Table(
    Source,
    {{"JSONPayload", "pre"}}
  ),
  JSON = Json.Document(
    HTMLTable{0}[JSONPayload]
  )
in
  JSON

[Thanks to Thais Marshall for bringing me this problem, and to Curt Hagenlocher for explaining and it showing me the first workaround]

Passing Any Arbitrary Value From A Power BI Report To A Dynamic M Parameter

Dynamic M parameters are incredibly useful in Power BI when using DirectQuery mode because they allow you to pass values from your report direct to your data source query, bypassing Power BI’s own SQL (or whatever query language your source uses) generation layer. However it isn’t obvious how to pass any value you want: dynamic M parameters must be bound to slicers or filters, and they must be bound to tables in your dataset, which means that at first glance it seems like you can only pass values that are already present somewhere in your dataset (for example in a dimension table) to a dynamic M parameter. This isn’t true though: there is a way to allow passing of any value your users enter to a dynamic M parameter and in this post I’ll show you how.

First of all, why is this useful? There are two reasons why you might want to allow passing of any value to a dynamic M parameter rather than binding to a slicer or filter in the normal way:

Displaying the list of possible values in a slicer or filter can be expensive, especially if the column you’re slicing on contains thousands or millions of values. It can slow down your report and lead to extra queries being run on your DirectQuery source, which can lead to performance problems.
Sometimes the values you want to pass in don’t exist in your DirectQuery source. The scenarios where this is useful are very similar to the scenarios where you’d want to use what-if parameters, but the big limitation of what-if parameters is that you have to pre-calculate all the values that a user might ever want to select and store them in a table. What happens if you don’t want to, or can’t, pre-calculate all the input values?

Let’s see an example of how you can pass any value you want to a dynamic M parameter.

The AdventureWorksDW2017 sample SQL Server database has a table called FactInternetSales with a column called SalesOrderNumber, and let’s say you want to build a Power BI report where an end user can enter whatever SalesOrderNumbers they want and filter the table by them. Let’s also assume that we have to use dynamic M parameters to do this efficiently (which isn’t the case with AdventureWorksDW2017 but could well be in the real world). Here’s what the data looks like:

To set things up in the Power Query Editor you need an M parameter to hold the values entered by the end user. In this example the parameter is called SalesOrderNumber:

Note that the Current Value property is set to a value that is not a valid sales order number, so when no input has been received from the users then no rows will be returned by the report.

Next you need a query that filters the FactInternetSales table by this parameter. Here’s the M code:

let
  Source = Sql.Database(
    "localhost",
    "AdventureWorksDW2017"
  ),
  dbo_FactInternetSales = Source
    {
      [
        Schema = "dbo",
        Item   = "FactInternetSales"
      ]
    }
    [Data],
  #"Removed Other Columns"
    = Table.SelectColumns(
    dbo_FactInternetSales,
    {
      "OrderDateKey",
      "SalesOrderNumber",
      "SalesOrderLineNumber",
      "SalesAmount"
    }
  ),
  FilterList =
    if Type.Is(
      Value.Type(SalesOrderNumber),
      List.Type
    )
    then
      SalesOrderNumber
    else
      {SalesOrderNumber},
  #"Filtered Rows" = Table.SelectRows(
    #"Removed Other Columns",
    each List.Contains(
      FilterList,
      [SalesOrderNumber]
    )
  )
in
  #"Filtered Rows"

There are two interesting things to notice here:

This code handles the cases where a user enters a single value, in which case the SalesOrderNumber M parameter will be of type text, or when the user enters multiple values in which case the SalesOrderNumber M parameter will be of type list. For more details on handling multi-select in dynamic M parameters see this post.
The actual filtering is done using the List.Contains M function, which does fold on SQL Server-related data sources. If you’re using other sources you should check if query folding happens for List.Contains for your source.

Thirdly, you need a dummy dimension table with a single column for the dynamic M parameter to be bound to in the report. The dimension table shouldn’t contain any data; here’s the M code to use (the query is called DimSalesOrderNumber):

let
  Source = #table(
    type table [SalesOrderNumber = text],
    {}
  )
in
  Source

This query returns a table with a single text column called SalesOrderNumber and no rows:

Once you’ve left the Power Query Editor the next thing to do is to bind the SalesOrderNumber M parameter to the SalesOrderNumber column of the DimSalesOrderNumber table:

Note that the Multi-select property has been enabled. Binding the dynamic M parameter to a table with no rows in means there’s no way a regular slicer could be used with it, because there are no values in the table for the slicer to display.

The last problem to solve is the important one: how do you allow end users to enter any value they want? There are two ways I know of. One is to use the filter pane and the “is” filter type under “Advanced filter”:

The filter pane is great because it’s built-in but it only allows the user to enter one or two (if they use the “Or” option) values to filter on. Remember also that not all filter types are available when you’re using dynamic M parameters.

A better approach, if you’re ok with using custom visuals, is to use the Filter By List custom visual which allows the end user to enter – or even copy/paste in – a list of values:

Understanding The “A cyclic reference was encountered during evaluation” Error In Power Query In Power BI Or Excel

When working in the Power Query Editor in Power BI or Excel, especially when you’re editing M code, you may run into the following error:

Expression.Error: A cyclic reference was encountered during evaluation.

What does this message mean and what causes this error? In this blog post I’ll explain.

The first step to understanding what’s going on here is to understand how Power Query evaluates the steps in a query. I wrote a blog post on let expressions in M some time ago, which you can read here, and which goes into a lot of detail on this subject but the basics are quite easy to understand. Conside the following M query:

let
    Step1 = 1,
    Step2 = Step1 + 5,
    Step3 = Step2*2
in
    Step3

If you paste this into a new blank query in the Power Query Editor you’ll see it results in three steps in the Applied Steps on the right hand side of the screen:

The query itself, as you would expect, returns the value 12:

The important thing is to know that Power Query does not evaluate the steps in the order that they appear here – so it does not evaluate Step1 first, then Step2 and then Step3. What it does is evaluate Step3 first, which is needed for the output of the query, and then because it needs the value of Step2 to do this it evaluates Step2 next, and then because it needs the value of Step1 to evaluate Step2 it evaluates Step1 last. You can visualise the chain of dependencies between these steps like so:

Now consider the following M query:

let
    Step1 = Step3,
    Step2 = Step1 + 5,
    Step3 = Step2*2
in
    Step3

The change is that Step1 is now set to the value of Step3 instead of the value 1. This query gives you the “A cyclic reference was encountered during evaluation” error and you can probably see why: the query returns the value of Step3, which needs the value of Step2, which needs the value of Step1, which needs the value of Step3 – so there’s no way of calculating the output of the query. There’s a loop in the dependencies between the steps:

This is the cyclic reference (sometimes known as a “circular reference”) from the error message. As you can imagine it’s very easy to create cyclic references when you’re editing M code manually and changing step names.

It’s also possible to run into this error when using referenced queries, that’s to say when one Power Query query uses the output of another Power Query query as its data source. For example, let’s say you had a query called Query1 with the following M code:

let
    Source = Query2
in
    Source

…and a query called Query2 with the following M code:

let
    Source = Query1
in
    Source

Once again there is a cyclic reference: Query1 refers to the output of Query2 and Query2 refers to the output of Query1. The error message returned for Query2 is:

An error occurred in the ‘Query1’ query. Expression.Error: A cyclic reference was encountered during evaluation.

An extra problem in this case is that if you try to delete just one of these queries, you’ll get the following error message:

The query “Query1” cannot be deleted because it’s being referenced by another query: “Query2”.

You have to select both queries and then delete them, or of course edit one of the queries to break the cyclic reference.

One last thing to mention: recursion is possible in M, meaning that you can write a custom function that calls itself (Miguel Escobar has a detailed post on this here) but it can be very hard to get it working and it’s not always the most efficient way to solve a problem.

Using OData.Feed And The Query Option To Avoid The Dynamic Data Sources Error In Power BI

In my last post I showed how, in many cases, you can avoid the “dynamic data sources” error with OData data sources by taking advantage of query folding. That’s not always possible though and in this post I’ll show you how you can use the Query option of the OData.Feed function to do so instead.

At first glance the Query option of OData.Feed looks very much like the Query option on Web.Contents which, of course, can also be used to avoid the dynamic data sources error (see here and here) and that’s true up to a point: you can use it to add query parameters to your OData URL. However the documentation is not particularly detailed and there is one thing that will confuse you when you try to use it: you can’t use it with OData system query options like $filter. For example, let’s say you wanted to query the People entity in the TripPin sample OData endpoint to get only the people whose first name was Scott. You can do this as follows:

let
  Source = OData.Feed(
    "https://services.odata.org/TripPinRESTierService/People?$filter=FirstName eq 'Scott'",
    null,
    [Implementation = "2.0"]
  )
in
  Source

Since $filter is a url query parameter, you might then try the following code:

let
  Source = OData.Feed(
    "https://services.odata.org/TripPinRESTierService/People",
    null,
    [
      Implementation = "2.0",
      Query = [
        #"$filter"
          = "FirstName eq 'Scott'"
      ]
    ]
  )
in
  Source

Unfortunately, this returns the following error:

Expression.Error: OData.Feed custom query options cannot start with ‘$’.

This is Power Query’s way of telling you you can’t use OData system query options with the OData.Feed Query option, since they always start with a $ sign. This is a deliberate design decision on the part of the Power Query team; I won’t go into the reasons why it is this way, it’s complicated!

However, you can use the Query option with OData custom query options like so:

let
  Source = OData.Feed(
    "https://services.odata.org/TripPinRESTierService/People",
    null,
    [
      Implementation = "2.0",
      Query = [#"debug-mode" = "true"]
    ]
  )
in
  Source

More importantly, you can use the Query option with OData parameter aliases. The original URL we were looking at:

https://services.odata.org/TripPinRESTierService/People?$filter=FirstName eq 'Scott'

…can be rewritten to use parameter aliases like so:

https://services.odata.org/TripPinRESTierService/People?$filter=FirstName eq @fn&@fn='Scott'

…and the rewritten version can be called using the Query option like so:

let
  Source = OData.Feed(
    "https://services.odata.org/TripPinRESTierService/People?$filter=FirstName eq @f", 
    null, 
    [
      Implementation = "2.0", 
      Query = [#"@f" = "'Scott'"]
    ]
  )
in
  Source

You can use parameter aliases to call OData unbound functions too. For example, you can call TripPin’s GetNearestAirport function like so:

let
  Source = OData.Feed(
    "https://services.odata.org/TripPinRESTierService/GetNearestAirport(lat = @latitude, lon = @longitude)", 
    null, 
    [
      Implementation = "2.0", 
      Query = [
        #"@latitude"  = "100", 
        #"@longitude" = "0"
      ]
    ]
  ), 
  #"Converted to Table" = Record.ToTable(
    Source
  )
in
  #"Converted to Table"

Finally, here’s an example of a query that starts with a table of latitudes and longitudes and calls the GetNearestAirport function for each row:

let
  Source = #table(
    type table [
      LocationLatitude = text,
      LocationLongitude = text
    ],
    {
      {"10", "10"},
      {"33", "-118"},
      {"100", "-100"}
    }
  ),
  #"Added Custom" = Table.AddColumn(
    Source,
    "FunctionCall",
    each OData.Feed(
      "https://services.odata.org/TripPinRESTierService/GetNearestAirport(lat = @latitude, lon = @longitude)",
      null,
      [
        Implementation = "2.0",
        Query = [
          #"@latitude" = [
            LocationLatitude
          ],
          #"@longitude" = [
            LocationLongitude
          ]
        ]
      ]
    )
  ),
  #"Expanded FunctionCall"
    = Table.ExpandRecordColumn(
    #"Added Custom",
    "FunctionCall",
    {"Name", "IcaoCode", "IataCode"},
    {
      "FunctionCall.Name",
      "FunctionCall.IcaoCode",
      "FunctionCall.IataCode"
    }
  )
in
  #"Expanded FunctionCall"

When you first publish a dataset with this query in to the Power BI Service you’ll see an internal server error message coming from the TripPin OData service. You can make the dataset refresh successfully and avoid this and the dynamic data sources error though: you need to check the Skip Test Connection box in the credentials dialog you can open from the Settings pane and set the data privacy level on the data source that is sending data to the OData function appropriately too:

Web.Contents, Text Concatenation And Dataset Refresh Errors In Power BI/Power Query

One of the most popular topics I’ve ever written about here is the dataset refresh errors you get with dynamic data sources and the Web.Contents function in Power BI (see here and here for example). Recently I learned about an interesting new detail regarding this problem that I thought was worth blogging about, and which may help someone trying to troubleshoot it.

Consider the following M query, used as the source of a table in a Power Query dataset:

let
  Term = "apples",
  Source = Json.Document(
    Web.Contents(
      Text.Combine(
        {
          "https://data.gov.uk/api/3/action/package_search?q=",
          Term
        }
      )
    )
  ),
  CallSucceeded = #table(
    type table [CallSucceeded = logical],
    {{Source[success]}}
  )
in
  CallSucceeded

[The API used here is public and doesn’t require authentication so you can run this query yourself]

This query refreshes successfully in Power BI Desktop but when you publish to the Power BI Service and go to the dataset Settings page you see the following error:

This dataset includes a dynamic data source. Since dynamic data sources aren’t refreshed in the Power BI service, this dataset won’t be refreshed. Learn more: https://aka.ms/dynamic-data-sources. Data source for Query1

Why is this a case of a dynamic data source? Unlike the examples in my previous post there is only one, known, value being passed to the API, but nevertheless the use of the Text.Combine M function is enough to prevent the Power Query engine from doing the static analysis it needs to do to allow refresh to take place. As documented here, the Power Query engine needs to be able to work out the URL it’s going to call before the query is run and in this case it can’t.

One way of solving this problem, the way I knew would work, is to use the Query option with Web.Contents like so:

let
  Term = "apples",
  Source = Json.Document(
    Web.Contents(
      "https://data.gov.uk/api/3/action/package_search",
      [Query = [q = Term]]
    )
  ),
  CallSucceeded = #table(
    type table [CallSucceeded = logical],
    {{Source[success]}}
  )
in
  CallSucceeded

What I didn’t know – or rather I did, but didn’t really understand – is that if all you want to do is concatenate text to build your URL you can use the & operator rather than Text.Combine and it will work because Power Query is smart enough to understand that during static analysis. So for example:

let
  Term = "apples",
  Source = Json.Document(
    Web.Contents(
      "https://data.gov.uk/api/3/action/package_search?q="
        & Term
    )
  ),
  CallSucceeded = #table(
    type table [CallSucceeded = logical],
    {{Source[success]}}
  )
in
  CallSucceeded

…will also successfully refresh in the Power BI Service because it uses the & operator. This is the way I would naturally write the code and I’m sure I’ve done this hundreds of times in the past without realising that concatenating text any other way will lead to errors.

[Thanks to Curt Hagenlocher for this information]

New Web.Headers M Function

[This post was originally published on the official Power Query blog, which has now been taken down. I’m republishing all my posts there to this blog to ensure the content remains available.]

You may have noticed a new M function was added to Power Query recently: Web.Headers. This function allows you to make HEAD requests to web services – the existing Web.Contents function, which it resembles closely, only allows you to make GET and POST requests. You can use the Web.Headers function to return the HTTP headers that would be returned if the URL passed to it was instead used to make a GET request.

For example, the following M query returns a record with the HTTP headers returned by a HEAD request to the URL”https://bing.com/search?q=Power+Query” :

let
    searchText = "Power Query",
    output = Web.Headers(
                "https://www.bing.com",
                [
                    RelativePath = "search",
                    Query = [q = searchText]
                ]
            )
in 
    output

The metadata of the record returned contains the HTTP response status code for the request. For example:

let
    searchText = "Power Query",
    output = Web.Headers(
                "https://www.bing.com",
                [
                    RelativePath = "search",
                    Query = [q = searchText]
                ]
            )
in 
    Value.Metadata(output)

…returns a record with a record containing the field Response.Status:

Custom Queries For “Detect Data Changes” In Power BI Incremental Refresh

One feature of Power BI incremental refresh I’ve always been meaning to test out is the ability to create your own M queries to work with the “detect data changes” feature, and last week I finally had the chance to do it. The documentation is reasonably detailed but I thought it would be a good idea to show a worked example of how to use it to get direct control over what data is refreshed during an incremental refresh.

First of all I created a simple dataset with incremental refresh enabled. The source was a SQL Server table with two columns: Date (actually a datetime column) and Sales.

I then configured incremental refresh as follows:

In the background this created six yearly partitions:

Nothing interesting here so far, but the real challenge lies ahead: how exactly do you use custom queries with “detect data changes”?

I created a new table in my SQL Server database called DetectDataChangesTable with one row for every partition in the dataset (even though the incremental refresh configuration above means only the 2021 and 2022 partitions will ever be refreshed) and the values for the RangeStart and RangeEnd M parameters that would be set when each partition is refreshed:

I then created an M query in my dataset called DetectDataChangesQuery that connected to this table, filtered the RangeStart column by the current value of the RangeStart M parameter and the RangeEndColumn by the current value of the RangeEnd M parameter, and then returned just the Output column:

let
  Source = Sql.Databases(
    "ThisIsMySQLServerName"
  ),
  IncrementalRefreshDemo = Source
    {[Name = "IncrementalRefreshDemo"]}
    [Data],
  dbo_DetectDataChangesTable
    = IncrementalRefreshDemo
    {
      [
        Schema = "dbo",
        Item = "DetectDataChangesTable"
      ]
    }
    [Data],
  FilterByParams = Table.SelectRows(
    dbo_DetectDataChangesTable,
    each [RangeStart]
      = RangeStart and [RangeEnd]
      = RangeEnd
  ),
  #"Removed Other Columns"
    = Table.SelectColumns(
    FilterByParams,
    {"Output"}
  )
in
  #"Removed Other Columns"

Here’s the output of the query in the Power Query Editor with the RangeStart M parameter set to 1/1/2021 and the RangeEnd M parameter set to 1/1/2022:

The important thing to point out here is that while the documentation says the query must return a scalar value, in fact the query needs to return a table with one column and one row containing a single scalar value.

After publishing the dataset once again, then next thing to do was to set the pollingExpression property described in the documentation. I did this by connecting to the dataset via the XMLA Endpoint using Tabular Editor 3, then clicking on the Sales table and looking in the Refresh Policy section in the Properties pane. I set the property to the name of the query I just created, DetectDataChangesQuery:

I then forced a full refresh of the Sales table, including all partitions, by running a TMSL script in SQL Server Management Studio and setting the applyRefreshPolicy parameter to false, as documented here. Here’s the TMSL script:

{
  "refresh": {
    "type": "full",
	"applyRefreshPolicy": false,
    "objects": [
      {
        "database": "IncrementalRefreshDetectDataChangesTest",
        "table": "Sales"
      }
    ]
  }
}

Scripting the entire table out to TMSL I could then see the refreshBookmark property on the two partitions (2021 and 2022) which could be refreshed in an incremental refresh set to 1, the value returned for those partitions in the Output column of the DetectDataChangesQuery query:

The refreshBookmark property is important because it stores the value that Power BI compares with the output of the DetectDataChangesQuery query on subsequent dataset refreshes to determine if the partition needs to be refreshed. So, in this case, the value of refreshBookmart is 1 for the 2021 partition but if in a future refresh the DetectDataChangesQuery returns a different value for this partition then Power BI knows it needs to be refreshed.

I then went back to the DetectDataChangesTable table in SQL and set the Output column to be 2 for the row relating to the 2021 partition:

Next, went back to SQL Server Management Studio and refreshed the table using a TMSL script with applyRefreshPolicy set to true (which is the default, and what would happen if you refreshed the dataset through the Power BI portal).

{
  "refresh": {
    "type": "full",
	"applyRefreshPolicy": true,
    "objects": [
      {
        "database": "IncrementalRefreshDetectDataChangesTest",
        "table": "Sales"
      }
    ]
  }
}

In the Messages pane of the query window I saw that Power BI had detected the value returned by DetectDataChangesQuery for the 2021 partition had changed, and that therefore the partition needed to be refreshed:

Lower down in the Messages pane the output confirmed that only the 2021 partition was being refreshed:

In Profiler I saw three SQL queries. The first two were to query the DetectDataChangesTable table for the two partitions that might be refreshed to check to see if the value returned in the Output column was different:

select [_].[Output]
from [dbo].[DetectDataChangesTable] as [_]
where ([_].[RangeStart] = convert(datetime2, '2022-01-01 00:00:00') 
and [_].[RangeStart] is not null) 
and ([_].[RangeEnd] = convert(datetime2, '2023-01-01 00:00:00') 
and [_].[RangeEnd] is not null)

select [_].[Output]
from [dbo].[DetectDataChangesTable] as [_]
where ([_].[RangeStart] = convert(datetime2, '2021-01-01 00:00:00') 
and [_].[RangeStart] is not null) 
and ([_].[RangeEnd] = convert(datetime2, '2022-01-01 00:00:00') 
and [_].[RangeEnd] is not null)

The third was to get the data for the 2021 partition, which was the only partition that needed to be refreshed:

select [_].[Date],
    [_].[Sales]
from [dbo].[Sales] as [_]
where [_].[Date] >= convert(datetime2, '2021-01-01 00:00:00') 
and [_].[Date] < convert(datetime2, '2022-01-01 00:00:00')

Finally, scripting the Sales table again to TMSL after the refresh had completed showed that the refreshBookmark property had changed to 2 for the 2021 partition:

And that’s it. I really like this feature but I’ve never seen anyone use this in the real world though, which is a shame. Maybe this blog will inspire someone out there to try it in production?

[UPDATE] An extra point to add is that if you use this functionality, the data source for the partitions must be the same as the data source used for the polling queries, otherwise you’ll get an error.

The Rank Column Transform In Power Query Online

[This post was originally published on the official Power Query blog, which has now been taken down. I’m republishing all my posts there to this blog to ensure the content remains available.]

Yet another new transformation has been added to Power Query Online: you can now add columns containing ranks to a table. To see how to use it, consider the following table containing sales and profit values for various products:

Let’s say you want to add a new column to this table containing the rank of each product, calculated according to its sales. To do this select the Sales column so that it is highlighted (as shown in the screenshot above) and then go to the Add Column tab on the ribbon in Power Query Online and click on the “Rank column” button:

When you do this the following dialog will appear:

Click OK and the dialog will close, and you will see that a new column has been added to your table containing the ranks for each product by sales and that the table itself has been sorted by the values in the Sales column too:

In most cases this will give you the values you want, but by clicking on the Advanced radio button in the Rank dialog you’ll find more options for controlling how the rank is calculated.

First of all, notice that because Oranges and Pears have the same sales value they both have a rank of 2 and that the next product, Apples, has a rank of 4. This is because the default rank method, Standard competition, has been used. There are there are two other methods of calculating ranks though. Choosing the Dense method in the Advanced tab of the dialog from the “Rank method” dropdown:

…changes the rank value for Apples to 3:

Choosing the Ordinal rank method ignores any tied values like so:

A different way of handling tied values for sales is to consider profit values, and this is also possible on the Advanced tab. Clicking the “Add ranking” button allows you to add another column to sort by, so that when sales is tied then the product with the highest profit will get the highest rank:

Here’s the output:

Notice now that Pears has a rank of 2 and Oranges has a rank of 3 because, even though they have the same sales, Pears has a higher value for Profit.

Behind the scenes this new functionality is powered by a new M function called Table.AddRankColumn which Reza Rad has already blogged about in detail here.

Category: M

Removing Diacritics From Text In Power Query

Like this:

Dynamic What-If With User-Entered Data In Power BI

Like this:

The “Block length does not match with its complement” Error In Power Query In Power BI And Excel

Like this:

Passing Any Arbitrary Value From A Power BI Report To A Dynamic M Parameter

Like this:

Understanding The “A cyclic reference was encountered during evaluation” Error In Power Query In Power BI Or Excel

Like this:

Using OData.Feed And The Query Option To Avoid The Dynamic Data Sources Error In Power BI

Like this:

Web.Contents, Text Concatenation And Dataset Refresh Errors In Power BI/Power Query

Like this:

New Web.Headers M Function

Like this:

Custom Queries For “Detect Data Changes” In Power BI Incremental Refresh

Like this:

The Rank Column Transform In Power Query Online

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this: