Power BI And Case Sensitivity

Case sensitivity is one of the more confusing aspects of Power BI: while the Power Query engine is case sensitive, the main Power BI engine (that means datasets, relationships, DAX etc) is case insensitive. Imke Feldmann shows how to make Power Query case insensitive through custom M here; in this blog post I’m going to concentrate on what case insensitivity means in datasets and DAX and show a way to (kind of) work around it.

Starting with a simple example, let’s say you have a data source that contains data that looks like this:

image

This is what the data looks like in the Power Query Editor which is, as I said, case sensitive – which means that it sees “A” and “a” as different characters, so you see six distinct characters in each row of the table. This could be data from any data source; to make things easy I’ve used the following M query to generate this table:

let
    Source = 
    #table(
        type table 
        [MyTextColumn = text], 
        {
            {"A"},
            {"a"},
            {"B"},
            {"b"},
            {"C"},
            {"c"}
        })
in
    Source

When this data gets loaded into a Power BI dataset, however, you’ll see the following in the Data pane of the main Power BI window:

image

Because the main Power BI engine is case insensitive – so for example “a” and “A” are considered as the same character – when this data is loaded in, it only sees three distinct characters and you can’t be sure whether you’ll get the lower case or upper case character stored twice. This is just the way it works; while an instance of Analysis Services Tabular, which is basically the same engine that is found in Power BI, can be either case sensitive or case insensitive, this option isn’t available in Power BI. There is an idea to support case sensitivity as an option in Power BI here and while there are some valid scenarios where you need this, in my experience case sensitivity causes a lot of problems so I can see why it wasn’t a priority.

What’s more you’ll find that DAX calculations are case insensitive too. If you try the following DAX expression to create a calculated table:

Demo Table = 
DATATABLE(
    "MyTextColumn",
    STRING,
    {
        {"A"},
        {"a"}
    }
)

You’ll get this:

image

DAX measures behave in a similar way. Given the following table:

image

If you use the following measure in a table visual:

Demo Measure = 
IF(
    SELECTEDVALUE('Demo Table 2'[MyIntColumn])=1,
    "A",
    "a"
)

…you’ll see:

image

The only way you can work around this case insensitivity is to make text values that would otherwise look the same to the Power BI engine somehow different. One way of doing this would be to add some extra characters to your text. You might think adding some extra spaces would be the way to go; revisiting the first M query shown above, you could add a space to every lower case character in the table like so:

let
    Source = 
    #table(
        type table 
        [MyTextColumn = text], 
        {
            {"A"},
            {"a "},
            {"B"},
            {"b "},
            {"c "},
            {"C"}
        })
in
    Source

But this doesn’t work because another little-known behaviour of the Power BI engine is that all leading and trailing spaces are trimmed when text is loaded. Anyway, spaces may not be visible but they still take up… well space. A better option – and one that actually works – is to use the Unicode Zero-Width Space character and the M Character.FromNumber function instead like so:

let
    Source = 
    #table(
        type table 
        [MyTextColumn = text], 
        {
            {"A"},
            {"a" & Character.FromNumber(8203)},
            {"B"},
            {"b" & Character.FromNumber(8203)},
            {"c" & Character.FromNumber(8203)},
            {"C"}
        })
in
    Source

The great thing about this character is that although it’s there, it’s invisible and takes no space. When you load this last M query into Power BI you see the following table:

image

Let’s finish off with a more advanced example of how to use this rather excellent character. Say you have the following source data returned by an M query called SourceData:

image

Of course, when this table is loaded into Power BI, you see the following:

image

BUT, the following M query takes this table and for each row goes through the text in the OriginalText column and adds a zero-width space after each lower case character:

let
    Source = 
    SourceData,
    ToList = 
    Table.AddColumn(
        Source, 
        "Chars", 
        each Text.ToList([OriginalText])
        ),
    LowerCaseChars = 
    {"a".."z"},
    AddInvisibleChars = 
    Table.AddColumn(
        ToList, 
        "AddInvisibleChars", 
        each 
        List.Transform(
            [Chars], 
            each 
            if 
            List.Contains(LowerCaseChars, _) 
            then 
            _ & Character.FromNumber(8203) 
            else _
            )
            ),
    RecombineList = 
    Table.AddColumn(
        AddInvisibleChars, 
        "OutputText", 
        each 
        Text.Combine([AddInvisibleChars]), 
        type text
        ),
    RemovedOtherColumns = 
    Table.SelectColumns(
        RecombineList,
        {"OutputText"}
        )

in
    RemovedOtherColumns

When you load the output of this query into Power BI, because the zero-width spaces after each lower case character make Power BI see each piece of text as being different, it looks like case has been preserved even when it hasn’t:

image

You can download the Power BI file for this last example here.

[Thanks to Greg Galloway, Akshai Mirchandani and Jeffrey Wang for their help with this post]

Power BI Custom Format String Examples, Part 2: Dates And Times

In my last post I showed lots of examples of how Power BI’s new custom format string feature can be used to format numbers. This post, looking at dates and times, will be a bit different for two reasons: there are a lot more useful examples of custom date and time formats built into Power BI Desktop, and some of the format placeholders listed in the VBA documentation aren’t supported in Power BI. As a result I’m going to concentrate on some useful formats that aren’t covered well by the examples and highlight a few things that aren’t possible right now.

Here’s the table containing the sample data, in the form of date/time values, for my examples:

image

[Note that the dates shown above are formatted in dd/mm/yyyy format]

The reason I’ve used date/time values for my examples is that they can be used to demonstrate formats for values of data type date and data type time, as well as data type date/time. As with my previous post I’m going to create a series of measures to show the effects of different format strings, each with the same DAX definition:

Eg1 = SELECTEDVALUE(Examples[DateAndTime])

With the default format of :

image

…applied, here’s what the output looks like in a Power BI report:

image

Dates

Let’s start by looking at date formats. The first thing to point out is that you can format a date/time so it only shows the date part and not the time and vice versa. For example, applying the custom format string:

dd/mm/yyyy

image

…where dd is day number, mm is month number and yyyy is a four digit year, gives you:

image

If you’re American and want your months to come before your days you can simply swap the dd and the mm, for example with the format string:

mm/dd/yyyy

image

You’re not forced to use a / as your separator in a date; in fact you can use any character. For example, the custom format:

dd*mm*yyyy

…gives you:

image

If you don’t want a leading zero in front of your day or month number you can use a single d or m, and if you want a two-digit year you can use yy instead of yyyy. So, for example:

d/m/yy

…gives you:

image

You can add full day names and month names using dddd and mmmm, so the format:

dddd dd mmmm yyyy

…gives you:

image

You can also get abbreviated day and month names using ddd and mmm, so:

ddd dd mmm yyyy

…gives you:

image

Last of all, for dates, if you want to make it clear that you’re not showing really, really, really old data you can put a g on the end of your date format like so:

dd/mm/yyyy g

…and you get the following:

image

Although the VBA documentation talks about showing day number of week, week number and quarter number, I haven’t found a way of making that work in Power BI (although I may have missed something and it may be possible).

Times

Times are a bit more straightforward. The main placeholders are hh for hours, mm or nn for minutes and ss for seconds. So for example:

hh:mm:ss

and

hh:nn:ss

…both give you times formatted using the 24-hour clock as follows:

image

If you prefer to use the 12-hour clock you can add AM/PM onto the end of your format string, so:

hh:nn:ss AM/PM

…gives you:

image

The VBA documentation has several variants on AM/PM with slightly different outputs, but this is the only format that I could make work in Power BI.

The very last thing to mention is that, at the time of writing, although Power BI can store times with millisecond values there is no way to make milliseconds appear in a formatted time or date/time. For example in all the screenshots above there are two rows displaying 11/11/2019 15:15:15; they appear as different rows in the table because the millisecond values for each are different, but there’s no way of formatting these values to show that they are different. Hopefully this will be rectified soon; in the meantime you will need to store the millisecond part of any time or date/time separately in a different column in your dataset if you want to display it.

You can download the example .pbix file for this post here.

Power BI Custom Format String Examples, Part 1: Numbers

Now that we can apply custom format strings to fields and measures in Power BI in the September 2019 release, I thought it would be useful to provide some examples of what’s possible with this very flexible new feature because the existing documentation for VBA isn’t easy to make sense of. In fact there’s so much to say I’m going to have to write a series of blog posts to cover everything! In this first post I’m going to look at formatting numbers.

First of all, here’s the source data I’m going to use for my examples:

image

I’m going to create a whole series of identical measures defined like this:

SalesEg1 = SUM('ExampleTable'[Sales])

…and apply different custom format strings to each one so you can compare the output in a Power BI table visual. For reference, here’s what a blank custom format gives you with this measure:

image

image

Let’s start with the basics of formatting numeric values. The first thing to point out is that custom format strings are built up using a series of placeholder characters that allow you to control things like thousands separators, the number of decimal places, whether digits are displayed in a placeholder and so on.

Setting the number of decimal places

As you can see in the screenshots above, two of the values have four decimal places but by default only two decimal places are shown. To always show three decimal places, use the following format string:

0.000

image

Here’s the result:

image

In this case the 0 is a placeholder for a digit that must always be displayed and the . is the decimal separator; three 0s after the . means you always get three decimal places for non-blank numeric values.

Optional digits

You may have noticed in the last screenshot that all numbers show three decimal places, even the value for Pears and the Total. If you don’t want the decimal places to appear – or indeed you don’t want a digit to appear in a particular place if it’s a zero – you can use a # character as a placeholder instead. The following format string:

0.###

…always shows a zero before the decimal separator, but will only show the decimal places if they aren’t zeroes:

image

Thousands separators

If you want to display a thousands separator in your numbers you can use a comma placeholder in your format string, like so:

#,0.###

image

Percentages

If you have values that you want to display as percentages, you can use the % placeholder as follows:

#,0.###%

image

Notice that two things have happened here:

  • A percentage sign has been added to the end of each value
  • The values appear to have been multiplied by 100. They actually haven’t, but the percentage format makes them look as though they have been. Any calculations that reference this measure will still get the unmultiplied value as you would expect.

Currency symbols

If you want currency symbols to appear in your format string you can just add them in either before or after the main part of your format string. For example to put a UK pound sign in a format string you can use the following:

£#,0.00

image

Different formats for positive values, negative values and zeroes

If you need to format positive values, negative values and zeroes differently, you can add up to three different sections to your custom format string separated by a semi colon, as follows:

#,0.0;-#,0.000;0

image

In this case notice how the positive values have one decimal place, the negative value has three decimal places and the zero has no decimal places. In Analysis Services Multidimensional it used to be possible to add a fourth section to format blanks/nulls, but that does not seem to work here unfortunately…

Formatting negative values with parentheses

A common requirement in financial reporting is to format negative values with parentheses (round brackets) instead of a minus sign, and that’s possible with custom format strings. For example:

#,0.0;(#,0.000);0

image

Other Text

You can escape individual characters in your format string by preceding them with a \ placeholder. Say you wanted a # to actually appear in your formatted output and not have it considered as a placeholder, you could use the following:

\##,0.00

image

You can also include whole chunks of text by putting it in double quotes, like so:

“Positive”;”Negative”;”Zero”

image

That’s enough for today; tune in for my next post with even more examples!

You can download the sample pbix file for this post here.

The Second Edition Of “The Definitive Guide To DAX” Is Out!

If you’re a Power BI fan there are three possible answers to the question “Did you know that the second edition of The Definitive Guide To DAX has just been published?”:

Answer#1: Yup, I’ve already got my copy!

If this is your answer there’s no need to read any further.

Answer #2: What’s “The Definitive Guide To DAX”?

If, on the other hand, you’re new to Power BI and this is what you’re thinking then I should explain that “The Definitive Guide To DAX” is a book by Marco Russo and Alberto Ferrari and is what its title suggests it is – the sum total of human knowledge about the DAX calculation and query language used by Power BI, written by the two people who know most about it outside the development team. Marco and Alberto are friends of mine but I don’t think anyone can accuse me of bias when I say that it’s a book that every Power BI developer needs to own, so go out and buy it! If you use Power BI you need to learn DAX and while this book may not be a simple step-by-step tutorial it has in it somewhere answers to just about every question you’ll ever ask about DAX – and, more importantly, the answers it has are as correct and as up-to-date as they possibly can be. I can tell you that it’s proved invaluable to me in my work at least twice in the last week alone.

Answer #3: Yes, I saw that but I already have the first edition – is it worth buying this one too?

This is a slightly more difficult question to answer, but I’m still going to recommend that you buy the second edition. As Marco says in his announcement blog post, a lot of the existing content has been updated and rewritten and a lot of new content has been added. If you care about following all the latest DAX best practices and you don’t want the new hire in your department to mock you because you’ve never heard of DAX Studio, you need to buy this new edition.

[Note: I didn’t get a free copy of this book for review (yet?) but I have an O’Reilly Online Learning account which means I could read it as soon as it was published]

PS I know someone needs to write the “Definitive Guide to M” but it’s not going to be me, at least not right now.

DAX Median() Function Does Not Work On Tables With More Than 2 Billion Rows

An interesting – if obscure – fact I learned recently is that a small number of DAX functions such as Median() do not work on tables with more than 2 billion rows in Analysis Services Tabular, Azure AS and Power BI.

It’s quite easy to reproduce in Power BI. The following M expression returns a table with two billion and four rows:

let
    Source = 
    List.Repeat(
        {1,2,3,4},
        500000001
        ),
    #"Converted to Table" = 
    Table.FromList(
        Source, 
        Splitter.SplitByNothing(), 
        null, 
        null, 
        ExtraValues.Error
        ),
    #"Changed Type" = 
    Table.TransformColumnTypes(
        #"Converted to Table",
        {{"Column1", Int64.Type}}
        )
in
    #"Changed Type"

It takes some time to load this table  – around twenty minutes – but because there are only four distinct values in the table the resulting .pbix file is only 31KB thanks to the way Power BI compresses data.

If you load this table into your dataset, call it VeryBigTable and create the following measure:

Median Test = MEDIAN(VeryBigTable[Column1])

…and use the measure in a visual, you’ll see the following error:

image

The current query cannot be evaluated for the ‘VeryBigTable (42)’ table, because the table contains more than two billion rows.

What’s more, the error will always occur even if you apply a filter to the table that returns less than two billion rows. The same problem occurs with some other functions, such as Percentile(), but it’s worth pointing out that the vast majority of DAX functions work as normal with tables with more than two billion rows – for example, in the pbix file used here the Sum() and CountRows() functions not only work fine but return instantly.

Luckily, in the case of the Median() function, there is an easy workaround because you can calculate a median in other ways such as the one described on the DAX Patterns site here. The code is a lot more verbose but it works on a 2 billion+ row table.

image

Storing Large Images In Power BI Datasets

Jason Thomas and Gerhard Brueckl have both blogged on the subject of storing images as text inside a Power BI dataset:

http://sqljason.com/2018/01/embedding-images-in-power-bi-using-base64.html

https://blog.gbrueckl.at/2018/01/storing-images-powerbi-analysis-services-data-models/

Since they wrote those posts, however, Power BI has added the ability to set the Data Category property on measures as well as columns in tables. This means it is now possible to have the output of a DAX measure displayed as an image in a Power BI report and this in turn opens up a lot of new possibilities – including the ability to work around the maximum size of a text value that can be loaded into Power BI (see my previous blog post for more details) and therefore work with larger images.

Here’s a rather lovely picture of a rose:

2014-08-18 17.02.10_20Pct

The original is about 2.1MB; I have a folder on my PC where different versions of this picture, saved at different percentages of the original size, are stored:

image

Using the technique that Gerhard wrote about, where the pictures can be stored as text in a single cell in a Power BI dataset and then displayed (in this case I’m using the Image by CloudScope custom visual) some truncation of the image occurs even with the smallest files because of the 32766 character limit on the length of a text value that can be loaded into Power BI. Here’s what you see when you display the version of the picture that is 20% of the original size, a file of only 113KB:

image

To work around this, what you need to do is to split the text representation of the image up into multiple smaller text values stored across multiple rows, each of which is less than the 32766 character limit, and then reassemble them in a DAX measure after the data has been loaded.

Splitting the text up in M is actually not that hard, but it is hard to do efficiently. Here’s an example of an M query that reads all the data from all of the files in the folder above and returns a table:

let
    //Get list of files in folder
    Source = Folder.Files("C:\Users\Chris\Documents\PQ Pics"),
    //Remove unnecessary columns
    RemoveOtherColumns = Table.SelectColumns(Source,{"Content", "Name"}),
    //Creates Splitter function
    SplitTextFunction = Splitter.SplitTextByRepeatedLengths(30000),
    //Converts table of files to list
    ListInput = Table.ToRows(RemoveOtherColumns),
    //Function to convert binary of photo to multiple
    //text values
    ConvertOneFile = (InputRow as list) =>
        let
            BinaryIn = InputRow{0},
            FileName = InputRow{1},
            BinaryText = Binary.ToText(BinaryIn, BinaryEncoding.Base64),
            SplitUpText = SplitTextFunction(BinaryText),
            AddFileName = List.Transform(SplitUpText, each {FileName,_})
        in
            AddFileName,
    //Loops over all photos and calls the above function
    ConvertAllFiles = List.Transform(ListInput, each ConvertOneFile(_)),
    //Combines lists together
    CombineLists = List.Combine(ConvertAllFiles),
    //Converts results to table
    ToTable = #table(type table[Name=text,Pic=text],CombineLists),
    //Adds index column to output table
    AddIndexColumn = Table.AddIndexColumn(ToTable, "Index", 0, 1)
in
    AddIndexColumn

In my next post I’ll show you my original version of this query, explain why it was slow, and try to explain how the version above works and why it is much faster.

Here’s what the query above returns:

image

The Pic column contains the split text values, each of which are less than the 32766 character limit, so when this table is loaded into Power BI no truncation occurs. The index column is necessary because without it we won’t be able to recombine all the split values in the correct order.

The only thing left to do is to create a measure that uses the DAX ConcatenateX() function to concatenate all of the pieces of text back into a single value, like so:

Display Image =
IF(
    HASONEVALUE('PQ Pics'[Name]),
    "data:image/jpeg;base64, " &
    CONCATENATEX(
        'PQ Pics',
        'PQ Pics'[Pic],
        ,
        'PQ Pics'[Index],
        ASC)
        )

…set the data category of this measure to be “Image URL”:

image

…and then display the value of the image in a report:

image

image

Unfortunately, as I also mentioned in my previous post, most DAX functions (and that includes ConcatenateX()) have a limit of around 2.1 million characters so the original 2.1MB file still can’t be displayed, alas:

image

However, I do think this technique will be useful because it allows you to work with much larger pictures than before.

It can also be useful in other situations too. I recently came across a great new custom visual called PDF Viewer that can display PDF files stored in text form in a Power BI report:

image

The example file for this visual shows how a large PDF file can be split across two columns in a table; the technique I describe here is a more practical solution to this problem.

What Is The Maximum Length Of A Text Value In Power BI?

What is the maximum length of a text value in Power BI? It turns out that this is a more complex question than you might think!

The maximum length of a text value that the Power Query engine can load into a single cell in a table in a dataset is 32766 characters – any more than that and the text will be silently truncated. However, if you’re working with text inside the Power Query engine you’ll find that you can work with much longer text values.  To illustrate this, consider the following M query:

let
    Source = 
        #table(
            type table[charcount = number],
            {
                {1},
                {10000},
                {30000},
                {40000}
                }
                ),
    #"Added Custom" = 
        Table.AddColumn(
            Source, 
            "LongText", 
            each Text.Repeat("1", [charcount]),
            type text
            ),
    #"Inserted Text Length" = 
        Table.AddColumn(
            #"Added Custom", 
            "Length", 
            each Text.Length([LongText]), 
            Int64.Type
            )
in
    #"Inserted Text Length"

It creates a table with four rows and three columns. The first column contains the numbers 1, 10000, 30000 and 40000; the second column contains the character “1” repeated the number of times given in the first column; the third column returns the length of the text in the second column using the Text.Length() M function. Here’s the output in the Power Query Editor, which is pretty much as you’d expect:

image

I’m not sure if there is a maximum length for text values in M; I experimented with adding an extra row to the table above with a 900,000,000 character text value and Text.Length() was able to return the correct value, albeit after a bit of a wait.

Load the table above into your Power BI dataset though, and add a DAX calculated column with the following expression:

DAX Length = LEN('LengthsDemo'[LongText])

…and you can see in the Data pane of the main Power BI Desktop window that the long text value in the last row has been truncated to 32766 characters:

image

Once you’ve loaded your data into Power BI the documentation says that the maximum length of a text value is “268,435,456 Unicode characters (256 mega characters) or 536,870,912 bytes”. The bad news is that many DAX functions such as ConcatenateX() put a limit on the length of the text values that they can work with at around 2.1 million characters (thank you Jeffrey Wang for providing this information – it isn’t documented anywhere at the moment). If you exceed this limit you’ll get the following error:

Function ‘PLACEHOLDER’ encountered a Text that exceeds the maximum allowable length.

In summary, then, there are two different practical limits on the maximum length of a text value in Power BI: the 32766 character limit on text being loaded into Power BI, and the 2.1 million character limit in DAX functions. The first of these can be worked around with some clever M – you need to split long text values up into multiple smaller values stored in different columns or rows – but even if you do this, the second limit may stop you recreating the original value after the data has been loaded.

Why is this useful or important? How can you split text values up in M in the most efficient way? I’ll come to that in my next two posts!

%d bloggers like this: