Data Wrangler: A Power Query-Like Experience For Python People

If, like me (and Ruth), you spend your life in Power BI but have a lingering feeling that you should get round to learning Python for data analysis sometime then here’s something you should check out: the new Data Wrangler extension for Visual Studio Code. All the details are in the announcement blog post:

https://devblogs.microsoft.com/python/data-wrangler-release/

…and this video is a great introduction to what it does:

Why is it interesting for someone like me? Because it works in a very, very similar way to Power Query – except that instead of generating M code in the background, it generates Python. It doesn’t have the same amount of functionality that Power Query does and the UI is a bit more basic but anyone with Power Query experience will feel immediately at home. I got it up and running very easily and I can see that it will be great for anyone learning Python or who needs a productivity boost.

DirectQuery Parallelisation In Power BI – Some Examples

Recently we announced an important new optimisation for DirectQuery datasets: the ability to run (some) of the queries generated by a single DAX query in parallel. You can read the blog post here:

https://powerbi.microsoft.com/en-za/blog/query-parallelization-helps-to-boost-power-bi-dataset-performance-in-directquery-mode/

A few of us on the Power BI CAT team have tested this out with customers and seen some great results, so I thought I’d write a post illustrating the effect this optimisation can have and explaining when it can and can’t help.

For the purposes of this blog post I built a DirectQuery dataset (the source and the design of the dataset are irrelevant) using the New York Taxi data. I then created a table with five measures on columns; the precise definitions of the measures don’t matter much either, except for the fact that they each generate a single Storage Engine request and horizontal fusion cannot combine these requests in any way.

Next, I published the dataset to the Power BI Service, connected to it from Tabular Editor, set the compatibility level property to 1569 and saved the change:

Having done this the Max Parallelism Per Query property became visible, and I set it to 1:

1 is not the default value for this property but it does give the same behaviour as the default at the time of writing – it ensures that the Storage Engine requests for a single DAX query are always executed one at a time.

Next I connected DAX Studio up to the XMLA Endpoint for the workspace and ran the DAX query generated by the table visual above with Server Timings enabled. The new timeline view in DAX Studio 3.0.6 (see Marco’s video for more details) does a great job of showing when the Storage Engine queries – in this case the SQL queries sent to the data source – are made:

As you can see, in this case it’s clear that the six Storage Engine queries/SQL queries generated by this one DAX query are all executed in sequence. Also notice that the duration of the DAX query was 3 seconds.

I then changed the Max Parallelism Per Query property to 6 and reran the same query from DAX Studio. Here’s what the timeline looked like:

The Storage Engine queries/SQL queries now all run in parallel and the overall duration of the DAX query is just 0.9 seconds!

This is great, but as always there are a few things that need to be highlighted. First, Power BI may not be able to run all the Storage Engine queries in parallel if there are certain types of dependency in your DAX. In the example above there were no dependencies between the measures – none of them referenced any other measures, and they all used fairly simple expressions – but in the real world that might not be the case. To illustrate this I created a new measure with the following definition:

MEASURE WithDependencies =
    IF (
        CALCULATE ( [Distinct Medallions], ALL ( 'GEOGRAPHY'[STATE] ) ) > 0,
        [Trip Count],
        [Monday Trips]
    )

I then used it in a new table visual and captured the DAX query generated:

Here’s what the DAX Studio Server Timings showed for this query with Max Parallelism Per Query set to 6:

This time there is some parallelism but the first Storage Engine query/SQL query has to complete before the last two can be executed. I wrote this measure specifically to get this behaviour so you may be able to rewrite your DAX to get better parallelism, but it’s something to be aware of.

One last thing to mention: increasing parallelism here may lead to worse query performance overall if you end up running into bottlenecks elsewhere in your architecture (see here for an example). You may need to increase the “maximum number of connections per data source” property on your dataset to allow more queries to run in parallel, and if that results in additional load on the data source then you may need to scale up or scale out there too. Remember also that the DAX queries for a report page are, and always have been run in parallel so the Storage Engine queries for different DAX queries will also be run in parallel; you’ll need to watch out for report pages with a large number of visuals on them.

ApproximateDistinctCount DAX Function Now Works On More DirectQuery Sources

Some good news for those of you using DirectQuery mode in Power BI: the ApproximateDistinctCount DAX function, which returns an estimate of the number of the distinct values in a column and which can be a lot faster than a true distinct count as returned by the DistinctCount function, is now available to use with BigQuery, Databricks and Snowflake sources. It only worked with Azure SQL DB and Synapse before; RedShift is coming soon. You can use it in exactly the same way that you would with the DistinctCount function except that it only works in DirectQuery mode.

For example, I have a Power BI DirectQuery dataset that uses the New York Taxi data in Snowflake as its source. With the following two DAX measures defined on the Trip table:

Approximate Distinct Medallions = APPROXIMATEDISTINCTCOUNT('TRIP'[MEDALLIONID])
Distinct Medallions = DISTINCTCOUNT('TRIP'[MEDALLIONID])

…I can build a table visual that compares the output of the two measures (as you can see, the difference isn’t that big):

…and see that the ApproximateDistinctCount DAX function is translated to the APPROX_COUNT_DISTINCT function in Snowflake SQL:

select { fn convert(count(distinct("MEDALLIONID")), SQL_DOUBLE) } + { fn convert(max("C1"), SQL_DOUBLE) } as "C1",
    approx_count_distinct("MEDALLIONID") as "C2"
from 
(
    select "DATEID",
        "MEDALLIONID",
        "HACKNEYLICENSEID",
        "PICKUPTIMEID",
        "DROPOFFTIMEID",
        "PICKUPGEOGRAPHYID",
        "DROPOFFGEOGRAPHYID",
        "PICKUPLATITUDE",
        "PICKUPLONGITUDE",
        "PICKUPLATLONG",
        "DROPOFFLATITUDE",
        "DROPOFFLONGITUDE",
        "DROPOFFLATLONG",
        "PASSENGERCOUNT",
        "TRIPDURATIONSECONDS",
        "TRIPDISTANCEMILES",
        "PAYMENTTYPE",
        "FAREAMOUNT",
        "SURCHARGEAMOUNT",
        "TAXAMOUNT",
        "TIPAMOUNT",
        "TOLLSAMOUNT",
        "TOTALAMOUNT",
        case
            when "MEDALLIONID" is null
            then CAST(1 as INTEGER)
            else CAST(0 as INTEGER)
        end as "C1"
    from "NYCDATA_DB"."NYCTAXIDATA"."TRIP"
) as "ITBL"

Distinct counts are often the slowest type of measure but in my experience report consumers are very unwilling to accept seeing “near enough” numbers in their reports rather than numbers that are 100% accurate, even if the approximate distinct counts are much faster. I heard someone suggest using field parameters to allow report consumers to switch between showing fast approximate distinct counts for exploration and accurate distinct counts when they really need them, and I think this is a great compromise.

Bonus links: if you need to do an approximate distinct count in Import mode, Phil Seamark shows how to do this here; Phil also wrote a great post on building aggregations for distinct counts (read it carefully – this is a really powerful technique!) here.

Text Search Performance In Power BI

In the blog post for the October 2022 release of Power BI Desktop there’s a brief description of a new optimisation in Power BI for filters that perform searches in text columns. In this blog post I’m going to share more details about this optimisation and how you can make sure your reports can benefit from it.

First of all, a brief description of the problem. Let’s say you have a Power BI dataset containing just the following table:

Let’s also say you want to build a report where a user can search for certain terms in the Description field and display a filtered table in a report:

Note how the Description field has been dragged into the Filter pane, the filter type is set to “Advanced filtering”, the “Show items when the value” dropdown has “contains” selected and the search term is “citrus”, so the table on the left only shows the fruit where the description includes the text “citrus”. Some custom visuals such as the Text Filter visual have very similar functionality.

Here’s the DAX query generated for the table visual in this screenshot:

DEFINE
	VAR __DS0FilterTable = 
		FILTER(
			KEEPFILTERS(VALUES('Fruit'[Description])),
			SEARCH("citrus", 'Fruit'[Description], 1, 0) >= 1
		)
	VAR __DS0Core = 
		CALCULATETABLE(
			SUMMARIZE('Fruit', 'Fruit'[Fruit Name], 'Fruit'[Description]),
			KEEPFILTERS(__DS0FilterTable)
		)
	VAR __DS0PrimaryWindowed = 
		TOPN(501, __DS0Core, 'Fruit'[Fruit Name], 1, 'Fruit'[Description], 1)
EVALUATE
	__DS0PrimaryWindowed
ORDER BY
	'Fruit'[Fruit Name], 'Fruit'[Description]

As you can see, the filter in this query is accomplished using the DAX Search() function. This is a great example of the type of query that the optimisation I’m talking about can speed up.

Here are some more details about how this optimisation works:

This optimisation is now enabled and works in the same way in both Power BI Desktop and the Power BI Service.
The first time that any query or measure that uses either the Search() or ContainsString() DAX functions on a text column is evaluated, Power BI starts to build a special text index just for that column.
This index build will only be successful if two conditions are true:
- The text column must only contain characters from the classic 128 character ASCII set.
- The index build must take less than 25 seconds. If 25 seconds elapse then the build will timeout and Power BI will continue to run the query without the index being present.
If the index build succeeds for that column then the index can be used by all subsequent queries by all users, but it will be dropped when:
- Power BI Desktop is restarted, if you’re in Power BI Desktop.
- The dataset is refreshed, either in Power BI Desktop or the Power BI Service.
- The dataset is evicted from memory in the Power BI Service or when the dataset is under memory pressure.
DAX queries that use the index will be a lot faster than queries that do not, although the difference will only be noticeable when you are searching on a table with thousands of rows and in a column with reasonably long text values.
There is no way for you to know whether an index has been built or whether the build has failed, or if a DAX query uses an index, unfortunately. However if you look at the duration of the DAX queries that do this kind of search (for example in Log Analytics or by running a Profiler trace) and you see the first query after a refresh is relatively slow and subsequent queries are almost instant then it’s likely that the index has been built successfully; on the other hand if your queries are consistently slow then it’s likely the index has not been built successfully. Before you all leave comments complaining, I know this is not ideal and I hope we’ll be able to make further improvements in the future.

How can you ensure that the index is built successfully? The only way to ensure that you stay under the 25 second timeout limit is to reduce the amount of text that needs to be indexed, either by reducing the number of rows in the table or by reducing the amount of text in the column. Reducing the amount of text in the column is probably the only feasible option: for example you may be able to remove words like “and” and “the” from your text since users are less likely to need to search for them. Making sure your text column only contains ASCII characters is much harder because you can’t just remove all non-ASCII characters (such as characters with diacritics) without making your text unsearchable; I have a good-enough solution in the form of an M custom function which removes diacritics from characters in text and ensures that only ASCII characters are left here.

[Thanks to Jeffrey Wang for the information in this post]

Removing Diacritics From Text In Power Query

I faced an interesting challenge recently: I had to use Power Query to remove all non-ASCII characters from some text but maintain the readability of that text. Let’s take the French phrase “un garçon très âgé à Noël” (which I know doesn’t make much sense but is great for testing) as an example. If you remove everything apart from the basic latin alphabet characters and spaces using Text.Select as I blogged here:

Text.Select(
"un garçon très âgé à Noël", 
List.Combine(
{{"A".."Z"},{"a".."z"},{" "}}
)
)

[Rick de Groot has a very detailed post on Text.Select here that’s worth reading]

…then the output is “un garon trs g Nol”. As you can see, removing all the characters leads to unreadable text. Instead, what you have to do is find all the letters with diacritics (accents and other glyphs that can be added to characters) and remove the diacritics. Doing this may be ungrammatical and make it harder to understand the meaning of the text but in most cases the text will still be readable.

The bad news is that there is no straightforward way to do this in Power Query, and indeed there is no straightforward way to do this at all because there are no hard-and-fast rules about what to replace a letter with a diacritic with: should “ö” become “o” or “oe” for example? My first thought was to create a big lookup table with all the rules of what to replace each character with in, similar to the approach taken here for solving this problem in Google Sheets. Building a comprehensive lookup table would be gigantic task though.

Luckily there is another solution (thanks to Curt Hagenlocher for finding this one for me) that is described in this thread on StackExchange: converting the text to binary using the Greek (ISO) code page and converting it back to text again using the Text.FromBinary and Text.ToBinary M functions. I’m not sure I understand why it works but it seems to work well enough for my purpose. Here’s the M code to do this:

Text.FromBinary(
Text.ToBinary(
"un garçon très âgé à Noël", 
28597
)
)

The output of this is “un garcon tres age a Noel”, which is exactly what we want. Of course it’s not perfect and there are cases where characters can’t be converted. If you take the text “Malus × zumi” (it’s a type of crabapple apparently), then:

Text.FromBinary(
Text.ToBinary(
"Malus × zumi", 
28597
)
)

returns the text “Malus ? zumi” – the “×” has been replaced by “?”. As a result you’ll probably also want to replace any question marks with a space and then remove any excess spaces; I’m going to ignore the fact that doing this might also remove any question marks that were in the original text.

Here’s an M custom function that does all this and also solves my original problem of only returning ASCII characters:

(inputText as text) as text =>
  let
    ReplaceDiacritics = Text.FromBinary(Text.ToBinary(inputText, 28597), TextEncoding.Ascii), 
    RemoveQuestionMarks = Text.Replace(ReplaceDiacritics, "?", " "), 
    RemoveExcessSpaces = Text.Combine(
      List.RemoveItems(Text.Split(Text.Trim(RemoveQuestionMarks), " "), {""}), 
      " "
    )
  in
    RemoveExcessSpaces

Power BI Migration Tools: What They Can And Can’t Do

Sometime last year, hundreds of people working at Microsoft BI consultancies around the world all had the same idea. Here’s what they were thinking:

I have so many customers wanting to migrate from legacy BI tools to Power BI. They are concerned that their current BI tool has an uncertain future. Licence renewals are looming and in the current economic climate organisations are looking to save money. Power BI is not only a lot cheaper than other BI tools, it’s a better tool overall and since Microsoft continues to make big investments in it then migration is clearly a no-brainer.

As a Power BI consultancy owner I have a problem though: I don’t have enough skilled people working for me to keep up with all this demand. What’s the answer? I know! Let’s build a tool that can help migrate all these legacy reports to Power BI!

The result is that, so far this year, I’ve seen or heard of five or six different Power BI migration tools built by various consultancies. That’s great and here are Microsoft we’re naturally supportive of our partners and want as many people to use Power BI as possible. I have reservations about some of these tools though, and these reservations fall into two categories.

First of all, some of the tools I’ve seen do things that are unsupported. In particular they programmatically generate .pbix files in an attempt to generate Power BI datasets and reports, and it is not possible to do this in a supported way. Generating datasets is certainly possible if you’re using Premium and making calls via the XMLA Endpoint but there is no supported way to automatically generate Power BI reports in this way at the time of writing. If you see a demo or run some tests, reports created this way will appear to work but there are no guarantees that future changes to Power BI will not break them. If that happens and thousands of reports in production suddenly stop working then you’ll naturally open a support case with Microsoft – and be told there’s nothing we can do to help. We know it would be great if there was a supported way to programmatically generate .pbix files and I hope that there will be one in the future, but for now this is the way it is.

Second, trying to replicate exactly what you did in your old BI tool in Power BI is not a good idea. This is a topic I wrote about in detail here but the point is that what was a best practice in your old BI tool may not be a best practice in Power BI: maybe your old tool liked wide flat tables rather than star schemas; maybe your old tool generated SQL queries against your data warehouse but in Power BI Import mode would be a better choice, and so on. As a result using an automated migration tool might give you quick results but result in more problems further on down the line.

I want to be clear that not all migration tools suffer from these problems. Most of the tools I’ve seen work within the boundaries of what is supported and leave a lot of room for consultants to make design changes. I’ve been seriously impressed by a few of them. What’s more, some of these tools do a great job on things like identifying what reports exist today, which people use them, which data sources they use and other things which are essential to the migration process, so they do add a lot of value. Thousands of organisations have already migrated to Power BI and here at Microsoft we know a lot about what makes a migration project successful, and good tools will certainly make migration quicker, easier and cheaper. The point I want to make is that if you’re building a migration tool or considering buying one you should understand what is and isn’t supported and which problems they can and can’t solve.

Dynamic What-If With User-Entered Data In Power BI

My recent post on passing any value to a Power BI dynamic M parameter from a report generated quite a lot of interest but only among people who use DirectQuery mode and who want more flexibility when filtering. In this post, however, I’ll show you an application for this technique that is useful for anyone, regardless of whether they are using DirectQuery mode or Import mode, and which is nothing to do with filtering.

You’re probably familiar with Power BI’s what-if parameter feature which allows end users to pass values into calculations. It’s great but it has one big limitation: all the values that can be used need to be hard-coded in a calculated table at design time. What if (haha) you could let your users enter the values they wanted to use in the report instead? Let’s see a simple example of how to do this.

Here’s an Import mode fact table containing sales values for various products:

The aim is to allow end users to enter one or more percentage values in a report and have these used to calculate growth forecasts on the Sales Amount value.

To do this you need to use the technique from the blog post referenced above to let the users enter the percentage values. First, create an M parameter of type Decimal Number (called EnteredValueParam) in the Power Query Editor:

Then create a dummy query (in this case called EnteredValues) with no rows and a single numeric column to bind the parameter to. Here’s the M code for this:

#table(
  type table [EnteredValues = number],
  {}
)

Now comes the fun part. Rather than use these entered values to filter a SQL query, they will be used to generate a SQL statement that returns the same values in the form of a disconnected, DirectQuery table connected to a relational database (SQL Server in this case). Here’s the M code for a query, in this example called ReturnedValues, that does this:

let
  //Check if the M parameter returns a list                                         
  //If it doesn't, turn it into a list                                    
  FilterList =
    if Type.Is(
      Value.Type(EnteredValueParam),
      List.Type
    )
    then
      EnteredValueParam
    else
      {EnteredValueParam},
  //Generate all the individual SELECT statements                                               
  GenerateSQLQueries = List.Transform(
    FilterList,
    each "SELECT '"
      & Text.From(_)
      & "%' AS Forecast, "
      & Text.From(_)
      & " AS Pct"
  ),
  //Concatenate all the SELECT statements with UNION ALL into a single statement                                                                              
  UnionAll = Text.Combine(
    GenerateSQLQueries,
    " UNION ALL "
  ),
  //Connect to the SQL Server database and run the query                                                      
  Source = Sql.Database(
    "localhost",
    "AdventureWorksDW2017"
  ),
  RunQuery = Value.NativeQuery(
    Source,
    UnionAll
  )
in
  RunQuery

If the parameter EnteredValueParam is passed the single value 1, the ReturnedValue query generates the following SQL query and runs it:

SELECT '1%' AS Forecast, 1 AS Pct

Here’s what the ReturnedValue query returns:

If the parameter EnteredValueParam returns the list {1,25,50,75}, the ReturnedValue query generates the following SQL query and runs it:

SELECT '1%' AS Forecast, 1 AS Pct 
UNION ALL 
SELECT '25%' AS Forecast, 25 AS Pct 
UNION ALL 
SELECT '50%' AS Forecast, 50 AS Pct 
UNION ALL 
SELECT '75%' AS Forecast, 75 AS Pct

Here’s what the ReturnedValue query returns in this case:

At this point, with all the queries loaded, the dataset looks like this:

The ReturnedValues table is in DirectQuery mode; the Sales table is in Import mode; the Entered Values table is in Import mode and the EnteredValues column on that table is bound to the EnteredValueParam M parameter, which has the Multi-select property turned on.

With all this in place you can create a measure on the Sales table which takes the sum of Sales Amount and applies a percentage growth based on the selected value in the Pct column of the ReturnedValues table:

Sales Amount Forecasts =
SUM ( Sales[SalesAmount] )
    * (
        ( SELECTEDVALUE ( ReturnedValues[Pct] ) / 100 ) + 1
    )

Finally, here’s a report page that uses the Filter By List custom visual to allow end users to enter the percentage forecast values and has a matrix to display the measure above with the Product field from the Sales table on rows and the Forecast field from the ReturnedValues table on columns:

Here it is in action:

It’s a bit of a pain that you need a DirectQuery data source in order to be able to do this but it’s not too difficult to set one up and it won’t need much in the way of resources. I think this technique (which I hereby name “Dynamic DirectQuery Disconnected Dimension Tables”) could be useful in other scenarios too, such as ABC analysis. If you can think of other ways to use this let me know by leaving a comment below!

Avoiding The “Maximum Allowed Size” Error In Power BI DirectQuery Mode With Aggregations On Degenerate Dimensions

Back in December I wrote a post explaining why you may encounter the “The resultset of a query to external data source has exceeded the maximum allowed size” error when working with degenerate dimensions in DirectQuery mode in Power BI. In that post I showed an example of how a moderately complex DAX expression in a measure can trigger the error even when you’re applying a filter in a visual; in this post I’ll show how you can use aggregations to avoid that error.

A quick recap: say you have a dataset built from the ContosoRetailDW sample database with a Date dimension table and a fact table called FactOnlineSales with more than 12 million rows in it.

There are two measures:

Sales Amount = SUM(FactOnlineSales[SalesAmount] )

Error Demo = 
var s = [Sales Amount]
return if(s>0, s)

If you build a table visual with the SalesOrderNumber column (a degenerate dimension from the fact table with the same granularity as the fact table) on rows and just the [Sales Amount] measure in, and you filter to a single date, then everything works ok:

…but if you add the [Error Demo] measure too a SQL query is generated to get all the values from the SalesOrderNumber column which returns more than a million rows and triggers the error:

For more details please see the original post.

Since the error is caused by a DirectQuery query to get all the values from the SalesOrderNumber column, one workaround is to build an Import mode aggregation table that contains all the distinct values from that column.

It was easy to do this in my case just using Power Query – I just duplicated the FactOnlineSales query, removed all columns apart from SalesOrderNumber, and did a “Remove Duplicates” transform. This left me with a new table containing just one column, SalesOrderNumber, that I could load into my dataset using Import mode:

I then configured this new table as an aggregation table for the FactOnlineSales table, with the SalesOrderNumber column from the new table set as a GroupBy on the SalesOrderNumber column from FactOnlineSales:

With this aggregation table configured, the table with both measures in no longer gives an error:

A Profiler trace shows that the query to get the data needed for the table is still in DirectQuery mode, but the “dimension query” to get all the distinct values from SalesOrderNumber now hits the Import mode aggregation:

I know what you’re going to say though: “If I’m using DirectQuery for my fact table I don’t want to build Import mode aggregations!”. Well yes, there are some limitations to point out with this approach. In my opinion it will work well if you are using DirectQuery mode because you have very large fact tables – even a few billion rows – but your data doesn’t change very frequently (say only once a day). In that scenario refreshing an aggregation table containing just a single column could be very fast and take up a relatively small amount of memory in Power BI, at least in comparison with an Import mode table containing all the columns from the fact table. Using incremental refresh on the aggregation table will also help but unfortunately you can’t use a hybrid table as an aggregation table at the time of writing this post, so you can’t mix Import mode and DirectQuery for the aggregation table. On the other hand if you’re using DirectQuery because your data changes frequently during the day then I don’t think this approach will work because it will be impossible to keep the contents of your Import mode aggregation table in sync with the contents of your DirectQuery fact table.

The “Block length does not match with its complement” Error In Power Query In Power BI And Excel

When using the Web.Contents function to call a web service in Power Query in either Power BI or Excel you may encounter the following error:

DataFormat.Error: Block length does not match with its complement.

Here’s an example of an M query that calls a web service and, at the time of writing, gives this error:

Web.Contents(
  "https://api.open-meteo.com/v1/forecast?latitude=52.52&longitude=13.41&hourly=temperature_2m"
)

The problem is caused by something to do with how some web services handle “deflate” compression; apparently the same issue exists in the equivalent native .NET functionality. There are two ways to avoid it though.

First of all, the easy way: ask the web service to use “gzip” compression instead. You can do this by setting the “Accept-Encoding” header to “gzip” like so:

Web.Contents(
  "https://api.open-meteo.com/v1/forecast?latitude=52.52&longitude=13.41&hourly=temperature_2m",
  [
    Headers = [
      #"Accept-Encoding" = "gzip"
    ]
  ]
)

Having done this the error will no longer occur and you can go on to handle the data returned by the web service (which is a JSON document) as normal.

The second is to use the Web.BrowserContents function instead of Web.Contents:

Web.BrowserContents(
  "https://api.open-meteo.com/v1/forecast?latitude=52.52&longitude=13.41&hourly=temperature_2m"
)

This is more complicated because the Web.BrowserContents function returns the HTML for a web page as viewed by a web browser. Apart from issues like handling authentication you’ll need to parse the result to get the data you need:

It’s doable using the Html.Table function though. Here’s an example of how to handle the response for this particular web service call:

let
  Source = Web.BrowserContents(
    "https://api.open-meteo.com/v1/forecast?latitude=52.52&longitude=13.41&hourly=temperature_2m"
  ),
  HTMLTable = Html.Table(
    Source,
    {{"JSONPayload", "pre"}}
  ),
  JSON = Json.Document(
    HTMLTable{0}[JSONPayload]
  )
in
  JSON

[Thanks to Thais Marshall for bringing me this problem, and to Curt Hagenlocher for explaining and it showing me the first workaround]

Passing Any Arbitrary Value From A Power BI Report To A Dynamic M Parameter

Dynamic M parameters are incredibly useful in Power BI when using DirectQuery mode because they allow you to pass values from your report direct to your data source query, bypassing Power BI’s own SQL (or whatever query language your source uses) generation layer. However it isn’t obvious how to pass any value you want: dynamic M parameters must be bound to slicers or filters, and they must be bound to tables in your dataset, which means that at first glance it seems like you can only pass values that are already present somewhere in your dataset (for example in a dimension table) to a dynamic M parameter. This isn’t true though: there is a way to allow passing of any value your users enter to a dynamic M parameter and in this post I’ll show you how.

First of all, why is this useful? There are two reasons why you might want to allow passing of any value to a dynamic M parameter rather than binding to a slicer or filter in the normal way:

Displaying the list of possible values in a slicer or filter can be expensive, especially if the column you’re slicing on contains thousands or millions of values. It can slow down your report and lead to extra queries being run on your DirectQuery source, which can lead to performance problems.
Sometimes the values you want to pass in don’t exist in your DirectQuery source. The scenarios where this is useful are very similar to the scenarios where you’d want to use what-if parameters, but the big limitation of what-if parameters is that you have to pre-calculate all the values that a user might ever want to select and store them in a table. What happens if you don’t want to, or can’t, pre-calculate all the input values?

Let’s see an example of how you can pass any value you want to a dynamic M parameter.

The AdventureWorksDW2017 sample SQL Server database has a table called FactInternetSales with a column called SalesOrderNumber, and let’s say you want to build a Power BI report where an end user can enter whatever SalesOrderNumbers they want and filter the table by them. Let’s also assume that we have to use dynamic M parameters to do this efficiently (which isn’t the case with AdventureWorksDW2017 but could well be in the real world). Here’s what the data looks like:

To set things up in the Power Query Editor you need an M parameter to hold the values entered by the end user. In this example the parameter is called SalesOrderNumber:

Note that the Current Value property is set to a value that is not a valid sales order number, so when no input has been received from the users then no rows will be returned by the report.

Next you need a query that filters the FactInternetSales table by this parameter. Here’s the M code:

let
  Source = Sql.Database(
    "localhost",
    "AdventureWorksDW2017"
  ),
  dbo_FactInternetSales = Source
    {
      [
        Schema = "dbo",
        Item   = "FactInternetSales"
      ]
    }
    [Data],
  #"Removed Other Columns"
    = Table.SelectColumns(
    dbo_FactInternetSales,
    {
      "OrderDateKey",
      "SalesOrderNumber",
      "SalesOrderLineNumber",
      "SalesAmount"
    }
  ),
  FilterList =
    if Type.Is(
      Value.Type(SalesOrderNumber),
      List.Type
    )
    then
      SalesOrderNumber
    else
      {SalesOrderNumber},
  #"Filtered Rows" = Table.SelectRows(
    #"Removed Other Columns",
    each List.Contains(
      FilterList,
      [SalesOrderNumber]
    )
  )
in
  #"Filtered Rows"

There are two interesting things to notice here:

This code handles the cases where a user enters a single value, in which case the SalesOrderNumber M parameter will be of type text, or when the user enters multiple values in which case the SalesOrderNumber M parameter will be of type list. For more details on handling multi-select in dynamic M parameters see this post.
The actual filtering is done using the List.Contains M function, which does fold on SQL Server-related data sources. If you’re using other sources you should check if query folding happens for List.Contains for your source.

Thirdly, you need a dummy dimension table with a single column for the dynamic M parameter to be bound to in the report. The dimension table shouldn’t contain any data; here’s the M code to use (the query is called DimSalesOrderNumber):

let
  Source = #table(
    type table [SalesOrderNumber = text],
    {}
  )
in
  Source

This query returns a table with a single text column called SalesOrderNumber and no rows:

Once you’ve left the Power Query Editor the next thing to do is to bind the SalesOrderNumber M parameter to the SalesOrderNumber column of the DimSalesOrderNumber table:

Note that the Multi-select property has been enabled. Binding the dynamic M parameter to a table with no rows in means there’s no way a regular slicer could be used with it, because there are no values in the table for the slicer to display.

The last problem to solve is the important one: how do you allow end users to enter any value they want? There are two ways I know of. One is to use the filter pane and the “is” filter type under “Advanced filter”:

The filter pane is great because it’s built-in but it only allows the user to enter one or two (if they use the “Or” option) values to filter on. Remember also that not all filter types are available when you’re using dynamic M parameters.

A better approach, if you’re ok with using custom visuals, is to use the Filter By List custom visual which allows the end user to enter – or even copy/paste in – a list of values:

Chris Webb's BI Blog

Data Wrangler: A Power Query-Like Experience For Python People

Like this:

DirectQuery Parallelisation In Power BI – Some Examples

Like this:

ApproximateDistinctCount DAX Function Now Works On More DirectQuery Sources

Like this:

Text Search Performance In Power BI

Like this:

Removing Diacritics From Text In Power Query

Like this:

Power BI Migration Tools: What They Can And Can’t Do

Like this:

Dynamic What-If With User-Entered Data In Power BI

Like this:

Avoiding The “Maximum Allowed Size” Error In Power BI DirectQuery Mode With Aggregations On Degenerate Dimensions

Like this:

The “Block length does not match with its complement” Error In Power Query In Power BI And Excel

Like this:

Passing Any Arbitrary Value From A Power BI Report To A Dynamic M Parameter

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this: