Why Power BI Slicers Sometimes Show Values That Do Not Exist

The problem of blank values being shown in a Power BI slicer is fairly well known, but did you know that in some circumstances slicers can show other values that you would not expect to see? While there’s a good reason for this it can nevertheless be mildly confusing for developers and end users alike when it happens.

Let’s take a look at one way this can happen. Here’s an M query that returns a table with up to five random numbers in:

[sourcecode language='text'  padlinenumbers='true']
let
    Source = 
        {1..5},
    RandomNumbers = 
        List.Transform(
            Source, 
            each 
                Number.Round(
                    Number.RandomBetween(
                        _, 
                        _+100
                        )
                , 0)
                ),
    RemoveDuplicates = 
        List.Distinct(RandomNumbers),
    ToTable = 
        Table.FromList(
            RemoveDuplicates, 
            Splitter.SplitByNothing(), 
            null, 
            null, 
            ExtraValues.Error
            ),
    ChangedType = 
        Table.TransformColumnTypes(
            ToTable,
            {{"Column1", Int64.Type}}
            )
in
    ChangedType
[/sourcecode]

Here’s an example of the output:

image

The key point to notice is that every time this query refreshes, a potentially different set of numbers will be returned.

Now consider a report page built from this table with three visuals on it:

image

There is a slicer on the left with five items in it, a table showing the actual contents of the table (I’ve disabled visual interactions so the slicer doesn’t slice the table) with five rows and a card showing the output of the following measure:

[sourcecode language='text' ]
Selected Number = 
SELECTEDVALUE(MyNumbers[Column1], "Nothing Selected")
[/sourcecode]

In the screenshot above you can see I have selected the value 78 in the slicer and the measure – as you would expect – displays that value.

Now what happens when you refresh the dataset and the table contains a different set of numbers? Here’s what:

SlicerNewValues

 

Summarising what is shown here:

  • After the refresh has taken place the number 78 no longer appears in the underlying table, which contains five rows still
  • However, because 78 was selected before refresh took place, it still appears – and is still selected in – the slicer as an extra value meaning there are now six items visible in the slicer
  • Even though 78 appears and is selected, the measure returns “Nothing Selected” because the DAX SelectedValue() function has detected that no single value has been selected
  • As soon as another value in the slicer is selected (in this case 83), the value 78 disappears from the slicer

Something similar may occur when using the Sync Slicers functionality to synchronise two slicers with different source columns that contain different values (as described here). For example, say you have a second table created as a DAX calculated table with the following expression:

[sourcecode language='text' ]
MySecondTable = ROW("Column2",-1)
[/sourcecode]

image

If you create a slicer from this table, use Sync Slicers to synchronise it with the existing slicer on the report shown above:

image

…and then select the value –1 in the new slicer, you’ll see it results in the value –1 appearing as selected in the original slicer even though that value does not appear in the underlying column:

image

Once again, if you select something else in the slicer, the –1 value will disappear.

In my opinion this behaviour makes sense and shouldn’t be changed. In any case, in the real world I always try to avoid situations where the values in a slicer might disappear: if you have a dimensional model and are building slicers from dimension tables rather than fact tables, you should find that values are rarely removed from dimension tables, if ever. That said I can’t help wondering if there are situations where this behaviour could be useful, for example in scenarios where you want to show an initial selection that displays no data but which can’t subsequently be reselected by an end user. As always, I’m interested in your thoughts and comments.

Creating Basic Greater Than Or Less Than Rules With Power BI Conditional Formatting

Although the conditional formatting by rules feature in Power BI was released a long time ago, one very common cause of confusion is with how to implement basic “greater than” or “less than” rules. For example, say you have a table with the following data in it:

image

…and you want to highlight the rows where Sales are greater than or equal to 150. So you click on the table, go to the Conditional Formatting options for the Sales field, turn on Background Color formatting and click Advanced Controls:

image

…then choose to format by rules. You see this dialog:

image

In particular, the part of this dialog where you set up the rule:

image

…seems to suggest that you need to enter a “is greater than or equal to” condition and a “is less than” condition for the rule to be valid. This is not true, and you don’t need to enter some arbitrarily large number for the “is less than” condition to make it work. The following screenshot shows how you can set up a rule to highlight all rows where Sales are greater than or equal to 150:

image

The two things to notice are:

  • I’ve entered 150 in the first condition, as you would expect
  • I have deleted the 0 from the second “is less than” condition, leaving the textbox empty (meaning that the text “Maximum” is visible but greyed out)

Here’s the result:

image

Job done. What about a slightly more complex but nonetheless common scenario where values greater than 0 are shown as green, values equal to 0 as yellow and values less than 0 as red (with maybe some special handling for blanks too)? Here’s a sample table:

image

(The Dummy column is only there to make sure the Oranges row, which contains a blank value, is visible in the table)

Here’s a set of rules that does what we need:

image

And here’s the output:

image

Q&A Buttons And Power BI Service Live Connections

A quick Q&A-related tip. If you create a report in Power BI Desktop and use a Power BI service live connection:

image

…and then you put a Q&A button on the report, clicking on the button gives the following error message:

image

Q&A isn’t supported for reports that use DirectQuery data sources, live connections to older Analysis Services data sources, or non-English language models

However, this message only means that Q&A buttons do not work in Power BI Desktop with Power BI service live connections. If you publish the report to the Power BI Service you will find that the Q&A button does work correctly.

image

[Thanks to Seth Bauer and Aaron Meyers for the information]

DAX Median() Function Does Not Work On Tables With More Than 2 Billion Rows

An interesting – if obscure – fact I learned recently is that a small number of DAX functions such as Median() do not work on tables with more than 2 billion rows in Analysis Services Tabular, Azure AS and Power BI.

It’s quite easy to reproduce in Power BI. The following M expression returns a table with two billion and four rows:

let

Source =
List.Repeat(
{1,2,3,4},
500000001
),
#"Converted to Table" =
Table.FromList(
Source,
Splitter.SplitByNothing(),
null,
null,
ExtraValues.Error
),
#"Changed Type" =
Table.TransformColumnTypes(
#"Converted to Table",
{{"Column1", Int64.Type}}
)
in
#"Changed Type"

It takes some time to load this table  – around twenty minutes – but because there are only four distinct values in the table the resulting .pbix file is only 31KB thanks to the way Power BI compresses data.

If you load this table into your dataset, call it VeryBigTable and create the following measure:

Median Test = MEDIAN(VeryBigTable[Column1])

…and use the measure in a visual, you’ll see the following error:

image

The current query cannot be evaluated for the ‘VeryBigTable (42)’ table, because the table contains more than two billion rows.

What’s more, the error will always occur even if you apply a filter to the table that returns less than two billion rows. The same problem occurs with some other functions, such as Percentile(), but it’s worth pointing out that the vast majority of DAX functions work as normal with tables with more than two billion rows – for example, in the pbix file used here the Sum() and CountRows() functions not only work fine but return instantly.

Luckily, in the case of the Median() function, there is an easy workaround because you can calculate a median in other ways such as the one described on the DAX Patterns site here. The code is a lot more verbose but it works on a 2 billion+ row table.

image

Detailed Article On The Power Query Formula Firewall And Data Privacy Settings

While writing my Microsoft Business Applications Summit session on Power Query-related reasons for Power BI dataset refresh failures (which you can watch online here for free, along with all the other sessions), I remembered something important that I have been meaning to blog about for a while. It’s an excellent article by Ehren von Lehe of the Power Query dev team that contains the first fully detailed explanation of how the Power Query Formula Firewall and data privacy settings actually work. Called “Behind the scenes of the Data Privacy Firewall” you can read it on the Power Query MSDN forum here:

https://social.technet.microsoft.com/Forums/en-US/ca434e2d-88fe-4962-b46a-a1db51e8bd89/feedback-wanted-behind-the-scenes-of-the-data-privacy-firewall?forum=powerquery

It’s a must-read for any Power Query fan, and it answers a number of the questions I raised in my series of blog posts on the data privacy settings here.

Removing HTML Tags From Text In Power Query/Power BI

Recently I found I needed to remove all the HTML tags from some text in Power Query. I searched and found a great – if complex – function from Marcel Beug here, but I realised that since that post was written the Html.Table M function has been added to the language and that it makes the task very easy. In fact it’s basically the same as the solution I blogged about here for extracting URLs from a web page.

Here’s an example of how to do it:

[sourcecode language=”text” padlinenumbers=”true”]
let
Source =
"<!DOCTYPE html>
<html>
<body>
<h1>A Heading</h1>
<p>A paragraph</p>
</body>
</html>",
RemoveTags = Html.Table(Source, {{"text",":root"}}),
GetText = RemoveTags[text]{0}
in
GetText
[/sourcecode]

Here’s the output of this query:

image

The important part is the call to Html.Table using the “:root” CSS selector:

[sourcecode language=”text”]
Html.Table(Source, {{"text",":root"}})
[/sourcecode]

This returns a table with one column and one row, containing the text; the GetText step in the query above just gets the contents of the only cell in that table.

Power Query Performance And Expanding Columns

As a footnote to my previous post on storing large images in Power BI, I thought all you M nerds out there would be interested to read about a strange performance quirk I noticed while writing the code for that post.

My original attempt to write an M query to convert a folder full of images to text looked something like this:

let
Source =
Folder.Files("C:\Users\Chris\Documents\PQ Pics"),
SplitText =
(ImageBinaryText as text) =>
let
SplitTextFunction =
Splitter.SplitTextByRepeatedLengths(30000),
SplitUpText =
SplitTextFunction(ImageBinaryText)
in
SplitUpText,
#"Added Custom" =
Table.AddColumn(
Source,
"Pic1",
each SplitText(
Binary.ToText(
[Content],
BinaryEncoding.Base64
)
)
),
#"Expanded Pic" =
Table.ExpandListColumn(
#"Added Custom",
"Pic1"
)
in
#"Expanded Pic"

The approach I took was the one that seemed natural to me at the time:

  1. Use the Folder data source to connect to the folder containing the image files
  2. Define a function called SplitText that takes a long piece of text and splits it up into a list of text values no longer than 30000 characters
  3. Call the function once per row on the table returned by step (1)
  4. Use the Expand/Aggregate button to expand the new column created by step (3) and get a table with one row for each of the split-up text values

When I ran this query, though, I caught sight of something that is every Power Query developer’s worst nightmare:

image

Power Query had read 118MB of data from a file that is only 1.6MB: the old problem of multiple file reads. Using Process Monitor (as I describe here) confirmed it. I suspect the query was reading the whole file once for each of the split sections of text the function returned although I admit I didn’t confirm this.

I can’t say I knew what I was doing but I rewrote the query from scratch and came up with the code that I gave in the blog post which only reads from each file once (without using buffering too, I should point out). What’s the difference? I guess it must be the pattern of calling the function once per row in a table and then expanding using Table.ExpandListColumn that was to blame; I tried returning a table from the function instead of a list and the same thing happened. Maybe this is something we should avoid doing? More research is necessary, and, as always, I would be interested to hear about your experiences – it is after all a fairly common pattern.

Storing Large Images In Power BI Datasets

Jason Thomas and Gerhard Brueckl have both blogged on the subject of storing images as text inside a Power BI dataset:

http://sqljason.com/2018/01/embedding-images-in-power-bi-using-base64.html

https://blog.gbrueckl.at/2018/01/storing-images-powerbi-analysis-services-data-models/

Since they wrote those posts, however, Power BI has added the ability to set the Data Category property on measures as well as columns in tables. This means it is now possible to have the output of a DAX measure displayed as an image in a Power BI report and this in turn opens up a lot of new possibilities – including the ability to work around the maximum size of a text value that can be loaded into Power BI (see my previous blog post for more details) and therefore work with larger images.

Here’s a rather lovely picture of a rose:

2014-08-18 17.02.10_20Pct

The original is about 2.1MB; I have a folder on my PC where different versions of this picture, saved at different percentages of the original size, are stored:

image

Using the technique that Gerhard wrote about, where the pictures can be stored as text in a single cell in a Power BI dataset and then displayed (in this case I’m using the Image by CloudScope custom visual) some truncation of the image occurs even with the smallest files because of the 32766 character limit on the length of a text value that can be loaded into Power BI. Here’s what you see when you display the version of the picture that is 20% of the original size, a file of only 113KB:

image

To work around this, what you need to do is to split the text representation of the image up into multiple smaller text values stored across multiple rows, each of which is less than the 32766 character limit, and then reassemble them in a DAX measure after the data has been loaded.

Splitting the text up in M is actually not that hard, but it is hard to do efficiently. Here’s an example of an M query that reads all the data from all of the files in the folder above and returns a table:

let
  //Get list of files in folder 
  Source = Folder.Files("C:\Users\Chris\Documents\PQ Pics"),
  //Remove unnecessary columns 
  RemoveOtherColumns = Table.SelectColumns(Source, {"Content", "Name"}),
  //Creates Splitter function 
  SplitTextFunction = Splitter.SplitTextByRepeatedLengths(30000),
  //Converts table of files to list 
  ListInput = Table.ToRows(RemoveOtherColumns),
  //Function to convert binary of photo to multiple 
  //text values 
  ConvertOneFile = (InputRow as list) =>
    let
      BinaryIn    = InputRow{0},
      FileName    = InputRow{1},
      BinaryText  = Binary.ToText(BinaryIn, BinaryEncoding.Base64),
      SplitUpText = SplitTextFunction(BinaryText),
      AddFileName = List.Transform(SplitUpText, each {FileName, _})
    in
      AddFileName,
  //Loops over all photos and calls the above function 
  ConvertAllFiles = List.Transform(ListInput, each ConvertOneFile(_)),
  //Combines lists together 
  CombineLists = List.Combine(ConvertAllFiles),
  //Converts results to table 
  ToTable = #table(type table [Name = text, Pic = text], CombineLists),
  //Adds index column to output table 
  AddIndexColumn = Table.AddIndexColumn(ToTable, "Index", 0, 1)
in
  AddIndexColumn

In my next post I’ll show you my original version of this query, explain why it was slow, and try to explain how the version above works and why it is much faster.

Here’s what the query above returns:

image

The Pic column contains the split text values, each of which are less than the 32766 character limit, so when this table is loaded into Power BI no truncation occurs. The index column is necessary because without it we won’t be able to recombine all the split values in the correct order.

The only thing left to do is to create a measure that uses the DAX ConcatenateX() function to concatenate all of the pieces of text back into a single value, like so:

Display Image = 
IF( HASONEVALUE('PQ Pics'[Name]), 
"data:image/jpeg;base64, " & 
CONCATENATEX( 'PQ Pics', 'PQ Pics'[Pic], , 'PQ Pics'[Index], ASC) )

…set the data category of this measure to be “Image URL”:

image

…and then display the value of the image in a report:

image
image

Unfortunately, as I also mentioned in my previous post, most DAX functions (and that includes ConcatenateX()) have a limit of around 2.1 million characters so the original 2.1MB file still can’t be displayed, alas:

image

However, I do think this technique will be useful because it allows you to work with much larger pictures than before.

It can also be useful in other situations too. I recently came across a great new custom visual called PDF Viewer that can display PDF files stored in text form in a Power BI report:

image

The example file for this visual shows how a large PDF file can be split across two columns in a table; the technique I describe here is a more practical solution to this problem.

[Update 14th June 2023: the new card visual in Power BI, announced here https://powerbi.microsoft.com/en-us/blog/new-card-visual-public-preview/, supports ImageURLs too as Armand van Amersfoort shows in this video: https://www.youtube.com/watch?v=XkFJFO6p-eE%5D

What Is The Maximum Length Of A Text Value In Power BI?

What is the maximum length of a text value in Power BI? It turns out that this is a more complex question than you might think!

The maximum length of a text value that the Power Query engine can load into a single cell in a table in a dataset is 32766 characters – any more than that and the text will be silently truncated. However, if you’re working with text inside the Power Query engine you’ll find that you can work with much longer text values.  To illustrate this, consider the following M query:

let
Source =
#table(
type table[charcount = number],
{
{1},
{10000},
{30000},
{40000}
}
),
#"Added Custom" =
Table.AddColumn(
Source,
"LongText",
each Text.Repeat("1", [charcount]),
type text
),
#"Inserted Text Length" =
Table.AddColumn(
#"Added Custom",
"Length",
each Text.Length([LongText]),
Int64.Type
)
in
#"Inserted Text Length"

It creates a table with four rows and three columns. The first column contains the numbers 1, 10000, 30000 and 40000; the second column contains the character “1” repeated the number of times given in the first column; the third column returns the length of the text in the second column using the Text.Length() M function. Here’s the output in the Power Query Editor, which is pretty much as you’d expect:

image

I’m not sure if there is a maximum length for text values in M; I experimented with adding an extra row to the table above with a 900,000,000 character text value and Text.Length() was able to return the correct value, albeit after a bit of a wait.

Load the table above into your Power BI dataset though, and add a DAX calculated column with the following expression:

DAX Length = LEN('LengthsDemo'[LongText])

…and you can see in the Data pane of the main Power BI Desktop window that the long text value in the last row has been truncated to 32766 characters:

image

Once you’ve loaded your data into Power BI the documentation says that the maximum length of a text value is “268,435,456 Unicode characters (256 mega characters) or 536,870,912 bytes”. The bad news is that many DAX functions such as ConcatenateX() put a limit on the length of the text values that they can work with at around 2.1 million characters (thank you Jeffrey Wang for providing this information – it isn’t documented anywhere at the moment). If you exceed this limit you’ll get the following error:

Function ‘PLACEHOLDER’ encountered a Text that exceeds the maximum allowable length.

In summary, then, there are two different practical limits on the maximum length of a text value in Power BI: the 32766 character limit on text being loaded into Power BI, and the 2.1 million character limit in DAX functions. The first of these can be worked around with some clever M – you need to split long text values up into multiple smaller values stored in different columns or rows – but even if you do this, the second limit may stop you recreating the original value after the data has been loaded.

Why is this useful or important? How can you split text values up in M in the most efficient way? I’ll come to that in my next two posts!

Power Query Comes To Azure Data Factory With Wrangling Data Flows

One of the many big announcements at Build this week, and one that caused a lot of discussion on Twitter, was about Wrangling Data Flows in Azure Data Factory. You can read the blog post here:

https://azure.microsoft.com/en-us/blog/analytics-in-azure-remains-unmatched-with-new-innovations/

…but what isn’t clear from this is that it’s basically Power Query Online integrated into ADF. You can see it in action by watching the following video – the demo of Wrangling Data Flows starts at around the 21 minute mark:

https://mybuild.techcommunity.microsoft.com/sessions/76997

image

As the presenter says, the Power Query Online editor generates M in the background as you would expect and “we are going to take this M and translate it into Spark and run it over big data”. Query folding to Spark, basically. More technical detail about all this is available here:

https://github.com/gauravmalhot/wranglingdataflow

…including a document discussing which M functions currently support query folding and which ones as yet don’t. Obviously, this feature will only work well if as much query folding as possible takes place.

This feels like a much more significant win for team Power Query than the integration with SSIS that was announced recently, if only because SSIS is a bit legacy and ADF is the cool new thing. I wonder if this opens up the possibility of integration between Power BI dataflows and ADF in the future, as another example of how self-service BI solutions can be easily transitioned into centrally-managed, enterprise-grade BI solutions? If that happens I hope someone sorts out the dataflow/data flow naming mess.

You can sign up for the preview of Wrangling Data Flows here.