Understanding The “A Measure Is Used In Cross Highlighting” Error In Power BI

When interacting with a Power BI report you may occasionally get the following error:

Couldn’t load the data for this visual

We can’t display this visual because a measure is used in cross highlighting. Please remove the measure or cross highlight.

What does this mean? The most important thing to explain is that you have not done anything wrong: you have, unfortunately, run into a limitation of Power BI where it can’t generate the query needed for a selection on a visual. The issue occurs in some very rare cases when Power BI needs to filter a measure using multiple fields from different tables. Since only workarounds involve changing your visuals, changing the way you interact with your report or changing the way your report behaves, it is helpful to understand the scenarios where you will encounter it.

To do this, take the following basic Power BI dataset:

There are two products in the Product table – Apples and Oranges – and two countries in the Country table – UK and France, plus some sales values in the fact table. All the data in the dataset (minus the keys) can be shown in a table like so:

Note that fields from all three tables in the dataset (the Country field from the Country table, the Product field from the Product table and the Sales field from the Sales table) are used in this visual and Sales is used as a measure.

One way to trigger this error is to select any one of the rows in the body of this table, right-click and select Exclude:

The presence of a measure plus two fields from two different tables is the key to making the error occur: if you remove either the Country or Product fields then the Exclude will work ok. You could also remove the Sales measure, but it’s likely both of these changes will result in the visual not displaying what you want it to display so you’re better off educating your users to expect to see this error if they use Exclude. Selecting the rows you want to keep and using Include instead also works:

Another way to trigger the error is to cross-highlight this table by selecting two rows from different levels in a separate matrix visual on the same page. So, for example, let’s say the same data was displayed in a matrix visual alongside the table visual on the same page:

If, in the matrix visual, you select the France row and the UK/Apples row at the same time, then the error will occur again:

If you use Edit Interactions to stop the matrix from being able to cross highlight the table then you’ll be able to stop the error:

Another possibility is to select rows that are all at the same level to cross highlight instead of selecting rows from different levels. So, for example, if you change the second matrix to be another table and select the two rows for France and the row for UK/Apples, you won’t get the error:

I’m sure there are other scenarios where this occurs (I see this is called out by OKVIZ here for some of their custom visuals) but hopefully these examples are enough to help you understand what’s going on here.

Identifying CPU And Memory-Intensive Power Query Queries During Refresh In The Power BI Service

Last year I wrote a post about a change in the Power BI Service that meant the CPU Time associated with Power Query transformations was added to the total shown for a dataset refresh operation in Profiler traces and Log Analytics:

https://blog.crossjoin.co.uk/2022/07/03/measuring-power-query-cpu-usage-during-power-bi-dataset-refresh/

This was useful, but it didn’t tell you directly how much CPU Time was used by Power Query and it didn’t tell you which tables or partitions in a refresh were using the most CPU. It also didn’t tell you anything about Power Query memory usage. The good news that recently there has been another change that solves these problems.

Let’s say you have a Power BI dataset that consists of a single table whose source is the following Power Query query:

let
  Source = #table(type table [MyNumber = number], List.Transform({1 .. 1000000}, each {_})), 
  #"Added Custom" = Table.AddColumn(
    Source, 
    "ARandomNumber", 
    each Number.RandomBetween(0, 10000), 
    type number
  ), 
  #"Sorted Rows" = Table.Sort(#"Added Custom", {{"ARandomNumber", Order.Ascending}})
in
  #"Sorted Rows"

This query creates a table with a million rows, adds a column with random numbers in and then sorts on that column – which is, as you’d expect, a very CPU and memory-hungry operation.

If you refresh this dataset in the Power BI Service and run a Profiler trace on it, looking at the Command Begin/End and Progress Report Begin/End events, this is what you’ll see:

The final Command End event shows the toal duration of the refresh as well as the amount of CPU used by both the Analysis Services engine and Power Query – in this case 24094ms.

If you look at the Progress Report End event associated with the finish of the refresh for the only partition of the only table in the dataset (highlighted in the screenshot above), there’s some extra information:

It shows the amount of CPU Time and the maximum amount of memory used by the Power Query engine while refreshing this partition. In this case the Power Query engine used 19468ms of CPU and reached a peak of 581848KB of memory. I can tell this is going to be really useful for troubleshooting refresh performance issues and out-of-memory errors.

[Thanks to Akshai Mirchandani, Xiaodong Zhang, Ting-Wei Chang and Jast Lu for this information]

Why Doesn’t Power Query Know If My Step Will Fold?

Recently, when preparing my session on query folding in Power Query for SQLBits, I wrote the following query to demonstrate the “Might Fold” indicator in Power Query Online:

let
  Source = OData.Feed(
    "https://services.odata.org/TripPinRESTierService", 
    null, 
    [Implementation = "2.0"]
  ), 
  Navigation = Source{[Name = "People", Signature = "table"]}[Data], 
  #"Filtered rows" = Table.SelectRows(Navigation, each [FirstName] = "Angel")
in
  #"Filtered rows"

This query connects to the TripPin OData feed (which is public, so you’ll be able to try this yourself) and filters the People table so it only gets the rows where the FirstName column equals “Angel”. If you paste this query into the Advanced Editor in Power Query Online to create a dataflow, you’ll see that the filter shows the “Might Fold” step indicator:

This tells you that Power Query doesn’t know if the filter on the FirstName column folds or not. The Query Plan view shows two alternate plans for if the query folds and if it doesn’t:

The question is, though, why can’t Power Query tell if the query will fold? I didn’t know so I asked Curt Hagenlocher of the Power Query development team, who very kindly explained.

It turns out that OData sources (and also ODBC sources) sometimes misreport their support for query folding. At the time that the query plan above is generated Power Query has already asked the source for its list of supported OData capabilities, but it won’t be until the query is actually run that it will know for sure if the filter can be folded. Similarly, some OData sources have an undeclared maximum URL length limit which means that if Power Query exceeds that it may get an error or even incorrect data back. As a result of this the runtime behaviour of Power Query has several fallbacks: it will try to fold fully, then fold some transforms, then fold nothing. All of which explains why the “Might Fold” step indicator exists.

Power BI TMDL And Version Control Announcements From SQLBits

If you missed out on the big announcement last week about Tabular Model Definition Language and the future of Power BI version control at SQLBits last week, then the recording of the session has already been published:

If you’re a professional Power BI developer you must watch this one! Mathias Thierbach (of https://pbi.tools/ fame) has also published the slides here.

This is just the beginning of a much better story for pro-developers in Power BI…

Data Wrangler: A Power Query-Like Experience For Python People

If, like me (and Ruth), you spend your life in Power BI but have a lingering feeling that you should get round to learning Python for data analysis sometime then here’s something you should check out: the new Data Wrangler extension for Visual Studio Code. All the details are in the announcement blog post:

https://devblogs.microsoft.com/python/data-wrangler-release/

…and this video is a great introduction to what it does:

Why is it interesting for someone like me? Because it works in a very, very similar way to Power Query – except that instead of generating M code in the background, it generates Python. It doesn’t have the same amount of functionality that Power Query does and the UI is a bit more basic but anyone with Power Query experience will feel immediately at home. I got it up and running very easily and I can see that it will be great for anyone learning Python or who needs a productivity boost.

DirectQuery Parallelisation In Power BI – Some Examples

Recently we announced an important new optimisation for DirectQuery datasets: the ability to run (some) of the queries generated by a single DAX query in parallel. You can read the blog post here:

https://powerbi.microsoft.com/en-za/blog/query-parallelization-helps-to-boost-power-bi-dataset-performance-in-directquery-mode/

A few of us on the Power BI CAT team have tested this out with customers and seen some great results, so I thought I’d write a post illustrating the effect this optimisation can have and explaining when it can and can’t help.

For the purposes of this blog post I built a DirectQuery dataset (the source and the design of the dataset are irrelevant) using the New York Taxi data. I then created a table with five measures on columns; the precise definitions of the measures don’t matter much either, except for the fact that they each generate a single Storage Engine request and horizontal fusion cannot combine these requests in any way.

Next, I published the dataset to the Power BI Service, connected to it from Tabular Editor, set the compatibility level property to 1569 and saved the change:

Having done this the Max Parallelism Per Query property became visible, and I set it to 1:

1 is not the default value for this property but it does give the same behaviour as the default at the time of writing – it ensures that the Storage Engine requests for a single DAX query are always executed one at a time.

Next I connected DAX Studio up to the XMLA Endpoint for the workspace and ran the DAX query generated by the table visual above with Server Timings enabled. The new timeline view in DAX Studio 3.0.6 (see Marco’s video for more details) does a great job of showing when the Storage Engine queries – in this case the SQL queries sent to the data source – are made:

As you can see, in this case it’s clear that the six Storage Engine queries/SQL queries generated by this one DAX query are all executed in sequence. Also notice that the duration of the DAX query was 3 seconds.

I then changed the Max Parallelism Per Query property to 6 and reran the same query from DAX Studio. Here’s what the timeline looked like:

The Storage Engine queries/SQL queries now all run in parallel and the overall duration of the DAX query is just 0.9 seconds!

This is great, but as always there are a few things that need to be highlighted. First, Power BI may not be able to run all the Storage Engine queries in parallel if there are certain types of dependency in your DAX. In the example above there were no dependencies between the measures – none of them referenced any other measures, and they all used fairly simple expressions – but in the real world that might not be the case. To illustrate this I created a new measure with the following definition:

MEASURE WithDependencies =
    IF (
        CALCULATE ( [Distinct Medallions], ALL ( 'GEOGRAPHY'[STATE] ) ) > 0,
        [Trip Count],
        [Monday Trips]
    )

I then used it in a new table visual and captured the DAX query generated:

Here’s what the DAX Studio Server Timings showed for this query with Max Parallelism Per Query set to 6:

This time there is some parallelism but the first Storage Engine query/SQL query has to complete before the last two can be executed. I wrote this measure specifically to get this behaviour so you may be able to rewrite your DAX to get better parallelism, but it’s something to be aware of.

One last thing to mention: increasing parallelism here may lead to worse query performance overall if you end up running into bottlenecks elsewhere in your architecture (see here for an example). You may need to increase the “maximum number of connections per data source” property on your dataset to allow more queries to run in parallel, and if that results in additional load on the data source then you may need to scale up or scale out there too. Remember also that the DAX queries for a report page are, and always have been run in parallel so the Storage Engine queries for different DAX queries will also be run in parallel; you’ll need to watch out for report pages with a large number of visuals on them.

ApproximateDistinctCount DAX Function Now Works On More DirectQuery Sources

Some good news for those of you using DirectQuery mode in Power BI: the ApproximateDistinctCount DAX function, which returns an estimate of the number of the distinct values in a column and which can be a lot faster than a true distinct count as returned by the DistinctCount function, is now available to use with BigQuery, Databricks and Snowflake sources. It only worked with Azure SQL DB and Synapse before; RedShift is coming soon. You can use it in exactly the same way that you would with the DistinctCount function except that it only works in DirectQuery mode.

For example, I have a Power BI DirectQuery dataset that uses the New York Taxi data in Snowflake as its source. With the following two DAX measures defined on the Trip table:

Approximate Distinct Medallions = APPROXIMATEDISTINCTCOUNT('TRIP'[MEDALLIONID])
Distinct Medallions = DISTINCTCOUNT('TRIP'[MEDALLIONID])

…I can build a table visual that compares the output of the two measures (as you can see, the difference isn’t that big):

…and see that the ApproximateDistinctCount DAX function is translated to the APPROX_COUNT_DISTINCT function in Snowflake SQL:

select { fn convert(count(distinct("MEDALLIONID")), SQL_DOUBLE) } + { fn convert(max("C1"), SQL_DOUBLE) } as "C1",
    approx_count_distinct("MEDALLIONID") as "C2"
from 
(
    select "DATEID",
        "MEDALLIONID",
        "HACKNEYLICENSEID",
        "PICKUPTIMEID",
        "DROPOFFTIMEID",
        "PICKUPGEOGRAPHYID",
        "DROPOFFGEOGRAPHYID",
        "PICKUPLATITUDE",
        "PICKUPLONGITUDE",
        "PICKUPLATLONG",
        "DROPOFFLATITUDE",
        "DROPOFFLONGITUDE",
        "DROPOFFLATLONG",
        "PASSENGERCOUNT",
        "TRIPDURATIONSECONDS",
        "TRIPDISTANCEMILES",
        "PAYMENTTYPE",
        "FAREAMOUNT",
        "SURCHARGEAMOUNT",
        "TAXAMOUNT",
        "TIPAMOUNT",
        "TOLLSAMOUNT",
        "TOTALAMOUNT",
        case
            when "MEDALLIONID" is null
            then CAST(1 as INTEGER)
            else CAST(0 as INTEGER)
        end as "C1"
    from "NYCDATA_DB"."NYCTAXIDATA"."TRIP"
) as "ITBL"

Distinct counts are often the slowest type of measure but in my experience report consumers are very unwilling to accept seeing “near enough” numbers in their reports rather than numbers that are 100% accurate, even if the approximate distinct counts are much faster. I heard someone suggest using field parameters to allow report consumers to switch between showing fast approximate distinct counts for exploration and accurate distinct counts when they really need them, and I think this is a great compromise.

Bonus links: if you need to do an approximate distinct count in Import mode, Phil Seamark shows how to do this here; Phil also wrote a great post on building aggregations for distinct counts (read it carefully – this is a really powerful technique!) here.

Text Search Performance In Power BI

In the blog post for the October 2022 release of Power BI Desktop there’s a brief description of a new optimisation in Power BI for filters that perform searches in text columns. In this blog post I’m going to share more details about this optimisation and how you can make sure your reports can benefit from it.

First of all, a brief description of the problem. Let’s say you have a Power BI dataset containing just the following table:

Let’s also say you want to build a report where a user can search for certain terms in the Description field and display a filtered table in a report:

Note how the Description field has been dragged into the Filter pane, the filter type is set to “Advanced filtering”, the “Show items when the value” dropdown has “contains” selected and the search term is “citrus”, so the table on the left only shows the fruit where the description includes the text “citrus”. Some custom visuals such as the Text Filter visual have very similar functionality.

Here’s the DAX query generated for the table visual in this screenshot:

DEFINE
	VAR __DS0FilterTable = 
		FILTER(
			KEEPFILTERS(VALUES('Fruit'[Description])),
			SEARCH("citrus", 'Fruit'[Description], 1, 0) >= 1
		)
	VAR __DS0Core = 
		CALCULATETABLE(
			SUMMARIZE('Fruit', 'Fruit'[Fruit Name], 'Fruit'[Description]),
			KEEPFILTERS(__DS0FilterTable)
		)
	VAR __DS0PrimaryWindowed = 
		TOPN(501, __DS0Core, 'Fruit'[Fruit Name], 1, 'Fruit'[Description], 1)
EVALUATE
	__DS0PrimaryWindowed
ORDER BY
	'Fruit'[Fruit Name], 'Fruit'[Description]

As you can see, the filter in this query is accomplished using the DAX Search() function. This is a great example of the type of query that the optimisation I’m talking about can speed up.

Here are some more details about how this optimisation works:

  • This optimisation is now enabled and works in the same way in both Power BI Desktop and the Power BI Service.
  • The first time that any query or measure that uses either the Search() or ContainsString() DAX functions on a text column is evaluated, Power BI starts to build a special text index just for that column.
  • This index build will only be successful if two conditions are true:
    • The text column must only contain characters from the classic 128 character ASCII set.
    • The index build must take less than 25 seconds. If 25 seconds elapse then the build will timeout and Power BI will continue to run the query without the index being present.
  • If the index build succeeds for that column then the index can be used by all subsequent queries by all users, but it will be dropped when:
    • Power BI Desktop is restarted, if you’re in Power BI Desktop.
    • The dataset is refreshed, either in Power BI Desktop or the Power BI Service.
    • The dataset is evicted from memory in the Power BI Service or when the dataset is under memory pressure.
  • DAX queries that use the index will be a lot faster than queries that do not, although the difference will only be noticeable when you are searching on a table with thousands of rows and in a column with reasonably long text values.
  • There is no way for you to know whether an index has been built or whether the build has failed, or if a DAX query uses an index, unfortunately. However if you look at the duration of the DAX queries that do this kind of search (for example in Log Analytics or by running a Profiler trace) and you see the first query after a refresh is relatively slow and subsequent queries are almost instant then it’s likely that the index has been built successfully; on the other hand if your queries are consistently slow then it’s likely the index has not been built successfully. Before you all leave comments complaining, I know this is not ideal and I hope we’ll be able to make further improvements in the future.

How can you ensure that the index is built successfully? The only way to ensure that you stay under the 25 second timeout limit is to reduce the amount of text that needs to be indexed, either by reducing the number of rows in the table or by reducing the amount of text in the column. Reducing the amount of text in the column is probably the only feasible option: for example you may be able to remove words like “and” and “the” from your text since users are less likely to need to search for them. Making sure your text column only contains ASCII characters is much harder because you can’t just remove all non-ASCII characters (such as characters with diacritics) without making your text unsearchable; I have a good-enough solution in the form of an M custom function which removes diacritics from characters in text and ensures that only ASCII characters are left here.

[Thanks to Jeffrey Wang for the information in this post]

Removing Diacritics From Text In Power Query

I faced an interesting challenge recently: I had to use Power Query to remove all non-ASCII characters from some text but maintain the readability of that text. Let’s take the French phrase “un garçon très âgé à Noël” (which I know doesn’t make much sense but is great for testing) as an example. If you remove everything apart from the basic latin alphabet characters and spaces using Text.Select as I blogged here:

Text.Select(
"un garçon très âgé à Noël", 
List.Combine(
{{"A".."Z"},{"a".."z"},{" "}}
)
)

[Rick de Groot has a very detailed post on Text.Select here that’s worth reading]

…then the output is “un garon trs g Nol”. As you can see, removing all the characters leads to unreadable text. Instead, what you have to do is find all the letters with diacritics (accents and other glyphs that can be added to characters) and remove the diacritics. Doing this may be ungrammatical and make it harder to understand the meaning of the text but in most cases the text will still be readable.

The bad news is that there is no straightforward way to do this in Power Query, and indeed there is no straightforward way to do this at all because there are no hard-and-fast rules about what to replace a letter with a diacritic with: should “ö” become “o” or “oe” for example? My first thought was to create a big lookup table with all the rules of what to replace each character with in, similar to the approach taken here for solving this problem in Google Sheets. Building a comprehensive lookup table would be gigantic task though.

Luckily there is another solution (thanks to Curt Hagenlocher for finding this one for me) that is described in this thread on StackExchange: converting the text to binary using the Greek (ISO) code page and converting it back to text again using the Text.FromBinary and Text.ToBinary M functions. I’m not sure I understand why it works but it seems to work well enough for my purpose. Here’s the M code to do this:

Text.FromBinary(
Text.ToBinary(
"un garçon très âgé à Noël", 
28597
)
)

The output of this is “un garcon tres age a Noel”, which is exactly what we want. Of course it’s not perfect and there are cases where characters can’t be converted. If you take the text “Malus × zumi” (it’s a type of crabapple apparently), then:

Text.FromBinary(
Text.ToBinary(
"Malus × zumi", 
28597
)
)

returns the text “Malus ? zumi” – the “×” has been replaced by “?”. As a result you’ll probably also want to replace any question marks with a space and then remove any excess spaces; I’m going to ignore the fact that doing this might also remove any question marks that were in the original text.

Here’s an M custom function that does all this and also solves my original problem of only returning ASCII characters:

(inputText as text) as text =>
  let
    ReplaceDiacritics = Text.FromBinary(Text.ToBinary(inputText, 28597), TextEncoding.Ascii), 
    RemoveQuestionMarks = Text.Replace(ReplaceDiacritics, "?", " "), 
    RemoveExcessSpaces = Text.Combine(
      List.RemoveItems(Text.Split(Text.Trim(RemoveQuestionMarks), " "), {""}), 
      " "
    )
  in
    RemoveExcessSpaces

Power BI Migration Tools: What They Can And Can’t Do

Sometime last year, hundreds of people working at Microsoft BI consultancies around the world all had the same idea. Here’s what they were thinking:

I have so many customers wanting to migrate from legacy BI tools to Power BI. They are concerned that their current BI tool has an uncertain future. Licence renewals are looming and in the current economic climate organisations are looking to save money. Power BI is not only a lot cheaper than other BI tools, it’s a better tool overall and since Microsoft continues to make big investments in it then migration is clearly a no-brainer.

As a Power BI consultancy owner I have a problem though: I don’t have enough skilled people working for me to keep up with all this demand. What’s the answer? I know! Let’s build a tool that can help migrate all these legacy reports to Power BI!

The result is that, so far this year, I’ve seen or heard of five or six different Power BI migration tools built by various consultancies. That’s great and here are Microsoft we’re naturally supportive of our partners and want as many people to use Power BI as possible. I have reservations about some of these tools though, and these reservations fall into two categories.

First of all, some of the tools I’ve seen do things that are unsupported. In particular they programmatically generate .pbix files in an attempt to generate Power BI datasets and reports, and it is not possible to do this in a supported way. Generating datasets is certainly possible if you’re using Premium and making calls via the XMLA Endpoint but there is no supported way to automatically generate Power BI reports in this way at the time of writing. If you see a demo or run some tests, reports created this way will appear to work but there are no guarantees that future changes to Power BI will not break them. If that happens and thousands of reports in production suddenly stop working then you’ll naturally open a support case with Microsoft – and be told there’s nothing we can do to help. We know it would be great if there was a supported way to programmatically generate .pbix files and I hope that there will be one in the future, but for now this is the way it is.

Second, trying to replicate exactly what you did in your old BI tool in Power BI is not a good idea. This is a topic I wrote about in detail here but the point is that what was a best practice in your old BI tool may not be a best practice in Power BI: maybe your old tool liked wide flat tables rather than star schemas; maybe your old tool generated SQL queries against your data warehouse but in Power BI Import mode would be a better choice, and so on. As a result using an automated migration tool might give you quick results but result in more problems further on down the line.

I want to be clear that not all migration tools suffer from these problems. Most of the tools I’ve seen work within the boundaries of what is supported and leave a lot of room for consultants to make design changes. I’ve been seriously impressed by a few of them. What’s more, some of these tools do a great job on things like identifying what reports exist today, which people use them, which data sources they use and other things which are essential to the migration process, so they do add a lot of value. Thousands of organisations have already migrated to Power BI and here at Microsoft we know a lot about what makes a migration project successful, and good tools will certainly make migration quicker, easier and cheaper. The point I want to make is that if you’re building a migration tool or considering buying one you should understand what is and isn’t supported and which problems they can and can’t solve.