Adding More Aggregate Columns To The Output Of Table.Profile

A few years ago I blogged about the Table.Profile M function and how you could use it to create a table of descriptive statistics for your data:

https://blog.crossjoin.co.uk/2016/01/12/descriptive-statistics-in-power-bim-with-table-profile/

Since that post was written a new, optional second parameter has been added to the function called additionalAggregates which allows you to add your own custom columns containing aggregate values to the output of Table.Profile, so I thought I’d write a follow-up on how to use it.

Consider the following query:

let
    Source = 
    #table(
        type table[Month = text, Sales = number],
        {
            {"Jan",1},
            {"Feb",5},
            {"Mar",17},
            {"Apr",1}
        }
    ),
    Profile = 
    Table.Profile(
        Source
    )
in
    Profile

 

There are two steps here. Source returns the following table using the #table function:

image

The second step uses Table.Profile to return a table of summary statistics:

image

[Several columns returned aren’t shown in the screenshot above]

If you want some extra columns on this table you can now add them with the second additionalAggregates parameter of Table.Profile like so:

let
    Source = 
    #table(
        type table[Month = text, Sales = number],
        {
            {"Jan",1},
            {"Feb",5},
            {"Mar",17},
            {"Apr",1}
        }
    ),
    Profile = 
    Table.Profile(
        Source,
        {
        {"Median", each Type.Is(_, type number), List.Median},
        {"Mode", each Type.Is(_, type number), List.Mode}
        }
    )
in
    Profile

In this example I’ve added two extra columns called Median and Mode, which return the median and mode values of every numeric column:

image

The additionalAggregates parameter takes a list, where each item is itself a list of three items:

  • The name of the new column to add.
  • A function to call to determine where the aggregate column will be applied to a given column in the source table. In this case I’m checking to see if a column is type number and only returning a median or mode if it is.
  • A function to call to calculate the aggregate value across the column: in this case List.Median and List.Mode. It’s a good idea to use an aggregate function that does not require storing every value in the column in memory (ideally one that can be folded, if you’re using a data source where query folding is supported) here to ensure fast performance.

[Thanks to Curt Hagenlocher for the information here]

Excel Dynamic Arrays And FilterXML

I’ll admit that I’m a bit less excited about Excel dynamic arrays than I was when I last blogged about them. Don’t get me wrong: from a pure Excel point-of-view they are still very cool, but I’ve since found out that the CubeValue function can’t be used with dynamic arrays which prevents me from doing all the really fun SSAS/Power BI/Power Pivot things I wanted to do with cube formulas.

It’s not all doom and gloom though. Several years ago I blogged about the then-new WebService and FilterXML functions (here and here). I very quickly found that the WebService function was very limited indeed and that Power Query did everything it did but better; on the other hand I felt FilterXML I had some unexplored potential, although I never got round to doing any exploring. Fast forward to last year and I saw that FilterXML was one of a number of existing functions that are affected by the new dynamic array behaviour, a change that makes it even more flexible.

Consider the following, publicly-available XML file:

http://www.hmrc.gov.uk/softwaredevelopers/rates/exrates-monthly-0719.xml

It’s a list of exchange rates published by the UK tax authorities and the contents look like this:

image

Just the kind of data you want to bring into Excel. The WebService function doesn’t work with this URL for some reason but it’s very easy to bring this data into an Excel table using Power Query with a few clicks using the Web data source:

image

If you prefer to work with dynamic arrays though (and I’m sure there are lots of reasons why that might be true), you can do that with a combination of Power Query and FilterXML.

The first thing to do is to use Power Query to load the entire XML document into a single cell in a worksheet. Here’s a query to do this:

let
    Source = 
    Text.FromBinary(
        Web.Contents(
            "http://www.hmrc.gov.uk/softwaredevelopers" &
            "/rates/exrates-monthly-0719.xml"
        )
    ),
    #"Converted to Table" = 
    #table(
        1, 
        {{Source}}
    ),
    #"Cleaned Text" = 
    Table.TransformColumns(
        #"Converted to Table",
        {{"Column1", Text.Clean, type text}}
    ),
    #"Replaced Value" = 
    Table.ReplaceValue(
        #"Cleaned Text",
        "> <",
        "><",
        Replacer.ReplaceText,{"Column1"}
    ),
    #"Replaced Value1" = 
    Table.ReplaceValue(
        #"Replaced Value",
        ">  <",
        "><",
        Replacer.ReplaceText,{"Column1"}
    ),
    #"Replaced Value2" = 
    Table.ReplaceValue(
        #"Replaced Value1",
        ">   <",
        "><",
        Replacer.ReplaceText,
        {"Column1"}
    ),
    #"Replaced Value3" = 
    Table.ReplaceValue(
        #"Replaced Value2",
        ">    <",
        "><",
        Replacer.ReplaceText,{"Column1"}
    )
in
    #"Replaced Value3"

One interesting point to make here: the FilterXML function does not like spaces between closing and opening angle brackets in XML (maybe this is why WebService errors too?) so I’m removing all occurrences of this, as well as removing any unprintable characters. Here’s the output, a table with one column and one row where the only cell contains the full XML:

image

You can then use an Excel formula like this to run an XPath query against this XML document:

=FILTERXML($A$2, "/exchangeRateMonthList/exchangeRate/countryName")

to get a list of all the country names spilling out to as many rows in the worksheet as necessary:

3d266a16-fa27-4927-9f79-558be7891293

[If you want a comparison with how FilterXML used to work in all its CTRL+SHIFT+ENTER glory see the “Scraping a whole XML document” section here; if you want to learn XPath, the query language used by FilterXML, there is a good tutorial here]

I’m not an XPath expert, or even an Excel expert, so I’ll finish here but hopefully this will prove useful to someone. You can download an Excel workbook containing the demos from this post here – note that neither Power Query nor dynamic arrays work in Excel Online yet, so don’t look at the workbook in the browser.

SSASDiag: A Tool To Help Diagnose Analysis Services Problems

There are a lot of great community-developed tools out there for Analysis Services developers to use (BI Developer Extensions, DAX Studio, Tabular Editor, Analysis Services Query Analyzer to name a few) and they have saved me vast amounts of time and effort over the years. When I joined Microsoft last month I came across one which I had never seen before but which is nevertheless quite mature and feature-rich: the SSAS Diagnostics Tool or SSASDiag for short. It’s available on GitHub here:
https://github.com/ssasdiag/SSASDiag

…and you can read the documentation here:
https://github.com/ssasdiag/SSASDiag/wiki/SSAS-Diagnostics—Analysis

image

It’s an open source tool developed by the people who support Analysis Services here at Microsoft and is intended to help them collect and analyse the information they need to troubleshoot on-premises SSAS  issues, but it’s available for anyone to use. I haven’t had a chance to take a proper look at it yet myself, unfortunately, but I thought it would be interesting for any SSAS fans out there to check out.

[Thanks to Jon Burchel for providing all the background information for this post]

Sharing Data Between Organisations With Azure Data Share

If you’ve ever built a BI solution it’s likely you will have had to integrate third party data, and if that’s the case you will know how painful it often is to get your hands on that data. Badly designed portals you have to log into every week to download the data, CSV files emailed to you, APIs with complex authentication – it’s usually an unreliable, slow and manual process. This is why I was interested to read about a new Azure service that previewed this week called Azure Data Share that aims to provide a simple and secure way to share data between organisations.

You can read the announcement blog post here:

https://azure.microsoft.com/en-us/blog/announcing-preview-of-azure-data-share/

…read the documentation here:

https://docs.microsoft.com/en-us/azure/data-share/

…and watch an introductory video here:

https://channel9.msdn.com/Shows/Azure-Friday/Share-data-simply-and-securely-using-Azure-Data-Share

What could this be used for in Power BI? Well, just for fun I uploaded the contents of a Common Data Model folder to a storage account in one Azure account and used Azure Data Share to copy that folder into a storage account in another Azure account. In the destination the data was stored in Azure Data Lake Gen2 storage, so I was able to attach the CDM folder as a dataflow in Power BI. It took a lot of trial-and-error on my part to get the permissions on the various storage accounts working properly (I’m not an expert on this…) but it worked. This proves that third-party data can be exposed directly inside Power BI as a dataflow using Azure Data Share, which I think is pretty darned cool. If you’re a company that sells data and you want your customers to be able to consume that data easily in Power BI, I think this might be a good way to do it.

Why Power BI Slicers Sometimes Show Values That Do Not Exist

The problem of blank values being shown in a Power BI slicer is fairly well known, but did you know that in some circumstances slicers can show other values that you would not expect to see? While there’s a good reason for this it can nevertheless be mildly confusing for developers and end users alike when it happens.

Let’s take a look at one way this can happen. Here’s an M query that returns a table with up to five random numbers in:

let
    Source = 
        {1..5},
    RandomNumbers = 
        List.Transform(
            Source, 
            each 
                Number.Round(
                    Number.RandomBetween(
                        _, 
                        _+100
                        )
                , 0)
                ),
    RemoveDuplicates = 
        List.Distinct(RandomNumbers),
    ToTable = 
        Table.FromList(
            RemoveDuplicates, 
            Splitter.SplitByNothing(), 
            null, 
            null, 
            ExtraValues.Error
            ),
    ChangedType = 
        Table.TransformColumnTypes(
            ToTable,
            {{"Column1", Int64.Type}}
            )
in
    ChangedType

Here’s an example of the output:

image

The key point to notice is that every time this query refreshes, a potentially different set of numbers will be returned.

Now consider a report page built from this table with three visuals on it:

image

There is a slicer on the left with five items in it, a table showing the actual contents of the table (I’ve disabled visual interactions so the slicer doesn’t slice the table) with five rows and a card showing the output of the following measure:

Selected Number = 
SELECTEDVALUE(MyNumbers[Column1], "Nothing Selected")

In the screenshot above you can see I have selected the value 78 in the slicer and the measure – as you would expect – displays that value.

Now what happens when you refresh the dataset and the table contains a different set of numbers? Here’s what:

SlicerNewValues

 

Summarising what is shown here:

  • After the refresh has taken place the number 78 no longer appears in the underlying table, which contains five rows still
  • However, because 78 was selected before refresh took place, it still appears – and is still selected in – the slicer as an extra value meaning there are now six items visible in the slicer
  • Even though 78 appears and is selected, the measure returns “Nothing Selected” because the DAX SelectedValue() function has detected that no single value has been selected
  • As soon as another value in the slicer is selected (in this case 83), the value 78 disappears from the slicer

Something similar may occur when using the Sync Slicers functionality to synchronise two slicers with different source columns that contain different values (as described here). For example, say you have a second table created as a DAX calculated table with the following expression:

MySecondTable = ROW("Column2",-1)

image

If you create a slicer from this table, use Sync Slicers to synchronise it with the existing slicer on the report shown above:

image

…and then select the value –1 in the new slicer, you’ll see it results in the value –1 appearing as selected in the original slicer even though that value does not appear in the underlying column:

image

Once again, if you select something else in the slicer, the –1 value will disappear.

In my opinion this behaviour makes sense and shouldn’t be changed. In any case, in the real world I always try to avoid situations where the values in a slicer might disappear: if you have a dimensional model and are building slicers from dimension tables rather than fact tables, you should find that values are rarely removed from dimension tables, if ever. That said I can’t help wondering if there are situations where this behaviour could be useful, for example in scenarios where you want to show an initial selection that displays no data but which can’t subsequently be reselected by an end user. As always, I’m interested in your thoughts and comments.

Creating Basic Greater Than Or Less Than Rules With Power BI Conditional Formatting

Although the conditional formatting by rules feature in Power BI was released a long time ago, one very common cause of confusion is with how to implement basic “greater than” or “less than” rules. For example, say you have a table with the following data in it:

image

…and you want to highlight the rows where Sales are greater than or equal to 150. So you click on the table, go to the Conditional Formatting options for the Sales field, turn on Background Color formatting and click Advanced Controls:

image

…then choose to format by rules. You see this dialog:

image

In particular, the part of this dialog where you set up the rule:

image

…seems to suggest that you need to enter a “is greater than or equal to” condition and a “is less than” condition for the rule to be valid. This is not true, and you don’t need to enter some arbitrarily large number for the “is less than” condition to make it work. The following screenshot shows how you can set up a rule to highlight all rows where Sales are greater than or equal to 150:

image

The two things to notice are:

  • I’ve entered 150 in the first condition, as you would expect
  • I have deleted the 0 from the second “is less than” condition, leaving the textbox empty (meaning that the text “Maximum” is visible but greyed out)

Here’s the result:

image

Job done. What about a slightly more complex but nonetheless common scenario where values greater than 0 are shown as green, values equal to 0 as yellow and values less than 0 as red (with maybe some special handling for blanks too)? Here’s a sample table:

image

(The Dummy column is only there to make sure the Oranges row, which contains a blank value, is visible in the table)

Here’s a set of rules that does what we need:

image

And here’s the output:

image

Q&A Buttons And Power BI Service Live Connections

A quick Q&A-related tip. If you create a report in Power BI Desktop and use a Power BI service live connection:

image

…and then you put a Q&A button on the report, clicking on the button gives the following error message:

image

Q&A isn’t supported for reports that use DirectQuery data sources, live connections to older Analysis Services data sources, or non-English language models

However, this message only means that Q&A buttons do not work in Power BI Desktop with Power BI service live connections. If you publish the report to the Power BI Service you will find that the Q&A button does work correctly.

image

[Thanks to Seth Bauer and Aaron Meyers for the information]

%d bloggers like this: