Power BI Custom Format String Examples, Part 1: Numbers

Now that we can apply custom format strings to fields and measures in Power BI in the September 2019 release, I thought it would be useful to provide some examples of what’s possible with this very flexible new feature because the existing documentation for VBA isn’t easy to make sense of. In fact there’s so much to say I’m going to have to write a series of blog posts to cover everything! In this first post I’m going to look at formatting numbers.

First of all, here’s the source data I’m going to use for my examples:

image

I’m going to create a whole series of identical measures defined like this:

SalesEg1 = SUM('ExampleTable'[Sales])

…and apply different custom format strings to each one so you can compare the output in a Power BI table visual. For reference, here’s what a blank custom format gives you with this measure:

image

image

Let’s start with the basics of formatting numeric values. The first thing to point out is that custom format strings are built up using a series of placeholder characters that allow you to control things like thousands separators, the number of decimal places, whether digits are displayed in a placeholder and so on.

Setting the number of decimal places

As you can see in the screenshots above, two of the values have four decimal places but by default only two decimal places are shown. To always show three decimal places, use the following format string:

0.000

image

Here’s the result:

image

In this case the 0 is a placeholder for a digit that must always be displayed and the . is the decimal separator; three 0s after the . means you always get three decimal places for non-blank numeric values.

Optional digits

You may have noticed in the last screenshot that all numbers show three decimal places, even the value for Pears and the Total. If you don’t want the decimal places to appear – or indeed you don’t want a digit to appear in a particular place if it’s a zero – you can use a # character as a placeholder instead. The following format string:

0.###

…always shows a zero before the decimal separator, but will only show the decimal places if they aren’t zeroes:

image

Thousands separators

If you want to display a thousands separator in your numbers you can use a comma placeholder in your format string, like so:

#,0.###

image

Percentages

If you have values that you want to display as percentages, you can use the % placeholder as follows:

#,0.###%

image

Notice that two things have happened here:

  • A percentage sign has been added to the end of each value
  • The values appear to have been multiplied by 100. They actually haven’t, but the percentage format makes them look as though they have been. Any calculations that reference this measure will still get the unmultiplied value as you would expect.

Currency symbols

If you want currency symbols to appear in your format string you can just add them in either before or after the main part of your format string. For example to put a UK pound sign in a format string you can use the following:

£#,0.00

image

Different formats for positive values, negative values and zeroes

If you need to format positive values, negative values and zeroes differently, you can add up to three different sections to your custom format string separated by a semi colon, as follows:

#,0.0;-#,0.000;0

image

In this case notice how the positive values have one decimal place, the negative value has three decimal places and the zero has no decimal places. In Analysis Services Multidimensional it used to be possible to add a fourth section to format blanks/nulls, but that does not seem to work here unfortunately…

Formatting negative values with parentheses

A common requirement in financial reporting is to format negative values with parentheses (round brackets) instead of a minus sign, and that’s possible with custom format strings. For example:

#,0.0;(#,0.000);0

image

Other Text

You can escape individual characters in your format string by preceding them with a \ placeholder. Say you wanted a # to actually appear in your formatted output and not have it considered as a placeholder, you could use the following:

\##,0.00

image

You can also include whole chunks of text by putting it in double quotes, like so:

“Positive”;”Negative”;”Zero”

image

That’s enough for today; tune in for my next post with even more examples!

You can download the sample pbix file for this post here.

The AutoSetDefaultInitialCatalog Analysis Services Server Property

In Shabnam Watson’s recent blog post on a bug she found when trying to create a Live connection from Power BI to Analysis Services she mentioned that the AutoSetDefaultInitialCatalog server property could be used to solve her problem. This piqued my interested because I’d seen this property but had no idea what it did exactly or why it was there. Luckily, now I work for Microsoft, it’s even easier for me to find out about things like this from the dev team and Akshai Mirchandani was able to help.

First of all, what does it do? The documentation on this property has just been added here, and this is what it says:

AutoSetDefaultInitialCatalog
A Boolean property. When set to true, new client connections automatically default to the first catalog (database) the user has permissions to connect to.
When set to false, no initial catalog is specified. Clients must select a default catalog prior to running queries or discover operations against a database on the server. If no default catalog is specified, an error is returned. If Initial Catalog property is specified in the connection string, the default catalog will be applied from this property.

The default value for this property is true.

Let me illustrate what this means. Say you have an instance of Analysis Services (in this case it’s Tabular, but it could be Multidimensional) with two databases on it:

image

I’ve expanded the Roles tab for each database reasons that will become clear later.

Next, let’s say you run a simple trace on this server looking at the Discover End and Session Initialize events:

image

…and while this trace is running, you open up SQL Server Management Studio and connect to the SSAS instance. Here’s what you see in Profiler:

image

Now, just to be clear, all I did was open up SQL Server Management Studio and connect to the instance. I did not open up a DAX query window or anything like that; all that happened was the list of databases on the instance was displayed in the Object Explorer pane.

image

The interesting thing to notice from the trace above is that when I did that there are five Session Initialize events and even though the Database column in Profiler is blank, you can see from the list of role names in the TextData column that in each case a connection has been made to the Adventure Works Internet Sales database.

This is because when you open a connection to Analysis Services and do not set the Initial Catalog connection string property, what happens is that you will get a connection to the default database on the instance. Which database is the default? It’s just the first database that the user has permission to access on the instance, which is a bit random.

This happens at other times too. Let’s say you right click on the EmptyDB database and process it in SQL Management Studio:

image

Here’s what I see in Profiler:

image

In this case there are three connections to the default database, Adventure Works Internet Sales, when the database I am processing is EmptyDB!

Most of the time these unnecessary connections have no impact at all but sometimes they can cause problems such as the ones Shabnam describes in her blog post. For example:

  • It can cause performance problems, because there is an overhead to opening a connection – for example roles are evaluated when a connection is opened
  • Monitoring and auditing gets complicated because, as you can see from the traces above, there are a whole lot of connections to the default database taking place that you aren’t expecting
  • Most importantly, when a connection is opened a read-commit lock is acquired on that database and in a few rare cases this can cause deadlocks and other locking-related issues

This is why the AutoSetDefaultInitialCatalog server property was introduced. With this server property set to False, when you open a connection to SSAS with no Initial Catalog set, then you get a connection with no database set. You can find this server property in SQL Server Management Studio in the Analysis Server properties dialog (which you can find by right-clicking on your instance name, selecting Properties, and going to the General tab) and checking the Advanced (All) Properties box.

image

With AutoSetDefaultInitialCatalog set to False, here’s what Profiler shows when I rerun my original test of connecting to SQL Server Management Studio:

image

Note that there are now no Session Initialize events now.

Here’s what opening up a new MDX query window in SQL Management Studio shows with AutoSetDefaultInitialCatalog set to False if you don’t explicitly set a database when you connect:


image

image

Note the empty database dropdown box on the toolbar and the “Error loading metadata: No cubes were found” error message shown in the Metadata pane.

So why didn’t the dev team set AutoSetDefaultInitialCatalog to False by default on new instances? The problem with doing this is that it is a potential breaking change that could cause errors in some client tools. I’m not aware of any specific cases where this might happen but if you did decide to change AutoSetDefaultInitialCatalog to False on your instance you would need to test thoroughly to make sure it didn’t break anything. My feeling is, though, it is probably a good idea to AutoSetDefaultInitialCatalog to False on production servers and do the appropriate testing just in case those unnecessary connections are causing problems.

The Second Edition Of “The Definitive Guide To DAX” Is Out!

If you’re a Power BI fan there are three possible answers to the question “Did you know that the second edition of The Definitive Guide To DAX has just been published?”:

Answer#1: Yup, I’ve already got my copy!

If this is your answer there’s no need to read any further.

Answer #2: What’s “The Definitive Guide To DAX”?

If, on the other hand, you’re new to Power BI and this is what you’re thinking then I should explain that “The Definitive Guide To DAX” is a book by Marco Russo and Alberto Ferrari and is what its title suggests it is – the sum total of human knowledge about the DAX calculation and query language used by Power BI, written by the two people who know most about it outside the development team. Marco and Alberto are friends of mine but I don’t think anyone can accuse me of bias when I say that it’s a book that every Power BI developer needs to own, so go out and buy it! If you use Power BI you need to learn DAX and while this book may not be a simple step-by-step tutorial it has in it somewhere answers to just about every question you’ll ever ask about DAX – and, more importantly, the answers it has are as correct and as up-to-date as they possibly can be. I can tell you that it’s proved invaluable to me in my work at least twice in the last week alone.

Answer #3: Yes, I saw that but I already have the first edition – is it worth buying this one too?

This is a slightly more difficult question to answer, but I’m still going to recommend that you buy the second edition. As Marco says in his announcement blog post, a lot of the existing content has been updated and rewritten and a lot of new content has been added. If you care about following all the latest DAX best practices and you don’t want the new hire in your department to mock you because you’ve never heard of DAX Studio, you need to buy this new edition.

[Note: I didn’t get a free copy of this book for review (yet?) but I have an O’Reilly Online Learning account which means I could read it as soon as it was published]

PS I know someone needs to write the “Definitive Guide to M” but it’s not going to be me, at least not right now.

Setting Azure Analysis Services Server Properties Not Visible In SQL Server Management Studio

Users of on-premises Analysis Services will know that most of the useful server properties can be set in SQL Server Management Studio, some (such as MaxIntermediateRowsetSize) can only be set by editing the msmdsrv.ini file. How do you set these properties in Azure Analysis Services though, when there is no msmdsrv.ini file to edit?

The solution is to use an XMLA script to make the change. The easy way to do this is to open up the server properties dialog in SQL Management Studio by right clicking on your instance name in the Object Explorer and selecting Properties:

image

Then, in the server properties dialog, change any server property but do not click ok. Instead, click on the Script button and then select Script Action to New Query Window:

image

This will create a new XMLA query window in SSMS (the connection dialog for this window will be open too, which will freeze the server properties dialog, so you’ll need to either connect or dismiss the dialog to close the server properties dialog) with the XMLA script to make the server properties change you made. The actual change won’t take place, though, unless you execute the script – so don’t do that.

Instead, change the name of the server property in the script to the one you actually want to set and enter the value you want to set it to:

image

Note that you can’t just enter the name of the server property in most cases because server properties can be grouped into sections, so you’ll need to enter the section names too. For example for the MaxIntermediateRowsetSize property you’ll need to enter DAX\DQ\MaxIntermediateRowsetSize.

After that, all you need to do is hit the Execute button and the change will be made.

Tables, Numbers, Immutability And Power Query Performance

Following on from my last two posts on Power Query performance, I thought I would write about another trick that can make a big difference to your query execution times. It’s actually something that Ehren from the dev team mentioned in a forums thread that I blogged about here but it deserves to be better known.

Here’s one of the almost-optimised queries from my previous posts:

[sourcecode language='html'  padlinenumbers='true']
let
    Source = 
    Csv.Document(
        File.Contents(
            "C:\Users\chwebb\Downloads\pp-2018.csv"
            ),
            [Delimiter=",", Columns=16, 
            Encoding=1252, QuoteStyle=QuoteStyle.None]),
    #"Changed Type" = 
    Table.TransformColumnTypes(
        Source,
        {{"Column2", Int64.Type}}
        ),
    #"Sorted Rows" = 
    Table.Sort(
        #"Changed Type",
        {{"Column2", Order.Descending}}
        ),
    Column2 = 
    #"Sorted Rows"[Column2]{0}
in
    Column2
[/sourcecode]

This query returns a number (the maximum value in the column called Column2 of the table loaded in from a csv file) and takes about 5 seconds to run. Now consider the following query, which is the same as the previous query but with only the last line different:

[sourcecode language='html'  padlinenumbers='true']
let
    Source = 
    Csv.Document(
        File.Contents(
            "C:\Users\chwebb\Downloads\pp-2018.csv"
            ),
            [Delimiter=",", Columns=16, 
            Encoding=1252, QuoteStyle=QuoteStyle.None]),
    #"Changed Type" = 
    Table.TransformColumnTypes(
        Source,
        {{"Column2", Int64.Type}}
        ),
    #"Sorted Rows" = 
    Table.Sort(
        #"Changed Type",
        {{"Column2", Order.Descending}}
        ),
    Column2 = 
    #"Sorted Rows"[Column2]{0}
in
    Column2 + Column2 + Column2 + Column2
[/sourcecode]

This query returns the number returned by the previous query added together four times. As you would expect – and hope – this query also takes about 5 seconds to run. Now consider this query:

[sourcecode language='html'  padlinenumbers='true']
let
    Source = 
    Csv.Document(
        File.Contents(
            "C:\Users\chwebb\Downloads\pp-2018.csv"
            ),
            [Delimiter=",", Columns=16, 
            Encoding=1252, QuoteStyle=QuoteStyle.None]),
    #"Changed Type" = 
    Table.TransformColumnTypes(
        Source,
        {{"Column2", Int64.Type}}
        ),
    #"Sorted Rows" = 
    Table.Sort(
        #"Changed Type",
        {{"Column2", Order.Descending}}
        )
in
    #"Sorted Rows"[Column2]{0} +
    #"Sorted Rows"[Column2]{0} +
    #"Sorted Rows"[Column2]{0} +
    #"Sorted Rows"[Column2]{0}
[/sourcecode]

Instead of taking the value #”Sorted Rows”[Column2]{0} and storing it in the variable Column2 then adding Column2 four times, I’m  adding the expression #”Sorted Rows”[Column2]{0} together four times. The query returns the same number as the previous query. However this query takes 20 seconds to run! Why?

As well as the brief remarks at the end of the post I mentioned earlier, more details can be found in section 3.6 of the Power Query Language Specification on Immutability:

Once a value has been calculated, it is immutable, meaning it can no longer be changed. This simplifies the model for evaluating an expression and makes it easier to reason about the result since it is not possible to change a value once it has been used to evaluate a subsequent part of the expression. For instance, a record field is only computed when needed. However, once computed, it remains fixed for the lifetime of the record. Even if the attempt to compute the field raised an error, that same error will be raised again on every attempt to access that record field.


An important exception to the immutable-once-calculated rule applies to list and table values. Both have streaming semantics. That is, repeated enumeration of the items in a list or the rows in a table can produce varying results. Streaming semantics enables the construction of M expressions that transform data sets that would not fit in memory at once.

In the second query above the variable Column2 returns a number and after that number has been assigned to Column2 it cannot be changed. However in the last line of the third query when I add #”Sorted Rows”[Column2]{0} together four times, #”Sorted Rows” is a variable that returns a table which means it gets evaluated four times during the addition.

In summary, if you’re getting a table or list from a data source and you want to use a value inside that table or list multiple times, store it in a variable in a let expression before you do so.

Power Query/M Optimisation: Getting The Maximum Value From A Column, Part 2

In part 1 of this series – which I strongly recommend you read before reading this post – I showed how removing columns from a table can make a dramatic improvement to the performance of certain transformations in Power Query. In this post I’ll show some tricks taught to me by Curt Hagenlocher of the dev team that can improve performance even more.

First of all, let me remind you of my original, totally unoptimised query which takes over a minute to run:

[sourcecode language='html'  padlinenumbers='true']
let
    Source = 
    Csv.Document(
        File.Contents(
            "C:\Users\chwebb\Downloads\pp-2018.csv"
            ),
            [Delimiter=",", Columns=16, 
            Encoding=1252, QuoteStyle=QuoteStyle.None]),
    #"Changed Type" = 
    Table.TransformColumnTypes(
        Source,
        {{"Column2", Int64.Type}}
        ),
    #"Sorted Rows" = 
    Table.Sort(
        #"Changed Type",
        {{"Column2", Order.Descending}}
        ),
    Column2 = 
    #"Sorted Rows"{0}[Column2]
in
    Column2
[/sourcecode]

In particular, I’d like to draw your attention to the last step where the position and lookup operators (I blogged about them here) are used to return the value in the Column2 column of the first line of the sorted table:

#”Sorted Rows”{0}[Column2]

Breaking this expression down:

  • #”Sorted Rows” returns the sorted table from the previous step in the query
  • {0} returns the first line of that table as a record
  • [Column2] gives the value in the Column2 field from that record

It turns out that changing the order of {0} and [Column2] in this expression makes a big impact on performance. So using the following expression in the last step instead:

#”Sorted Rows”[Column2]{0}

…reduces query execution time to 5-7 seconds! I’m told this is because asking for the column first does the same thing internally as manually removing all other columns in the table which, as I showed in my last post, has a dramatic impact on performance.

It’s important to understand that this is not a general recommendation to request the column before the row in expressions like this, because in other scenarios requesting the row before the column might perform better. What you need to do is test writing expressions like this both ways to see what gives you the best performance.

One piece of general advice that Curt did give me, though, was that most of the optimisations that the Power Query engine can use only happen on tables – not records or lists – so you should always try to work with tables whenever you can. For this particular query, using the Table.FirstN function to get the first row of the table like so:

Table.FirstN(#”Sorted Rows”,1)[Column2]{0}

…allows for another internal optimisation to kick in, taking the query execution time down to around 2 seconds – the same performance as the original query in my previous post.

Power Query/M Optimisation: Getting The Maximum Value From A Column, Part 1

I’ve just learned some really interesting new Power Query/M optimisation tricks! I’m going to write them up here as a series of worked examples rather than generalised patterns because I’m still trying to understand them properly myself and I’m not really sure what lessons to draw from them exactly; instead, I hope they give you some new ideas to try when optimising your own queries. I do think they will be useful to a lot of people though.

In this first part I’m going to set up the scenario and show you what I found out from my own experimentation. The really mind-blowing tricks shown to me by the ever-helpful Curt Hagenlocher of the Power Query dev team will be covered in part 2.

Let’s say you have a large csv file which contains a numeric column and you want to get the maximum value from that column. In this case I’m going to use the 2018 Price Paid data from the UK Land Registry available here. This csv file contains 1021215 rows, one for each property transaction in England and Wales in 2018; the second column in this file contains the the price paid for the property, so the aim here is to get the maximum price paid for all property transactions in 2018.

You can build this query quickly and easily, and get excellent performance, with a few clicks in the UI. After connecting to the csv file and setting the data type on Column2 to Whole Number, all you need to do is select Column2, go to the Transform tab in the Power Query Editor window and click the Statistics then select Maximum from the dropdown menu:

image

This returns the number we’re looking for:

image

The query takes around 1.5 seconds to run (I used the technique I blogged about here to measure the duration). Here’s the M code for the query:

let
Source =
Csv.Document(
File.Contents(
"C:\Users\chwebb\Downloads\pp-2018.csv"
),
[Delimiter=",", Columns=16,
Encoding=1252, QuoteStyle=QuoteStyle.None]),
#"Changed Type" =
Table.TransformColumnTypes(
Source,
{{"Column2", Int64.Type}}
),
#"Calculated Maximum" =
List.Max(#"Changed Type"[Column2])
in
#"Calculated Maximum"

I, of course, did not use this very efficient method when I first built my query. Instead did the following: after loading the data I sorted the table by Column2 in descending order and then right-clicked in the top cell in Column2 and selected Drill Down:

image

Here’s the resulting M code:

let
Source =
Csv.Document(
File.Contents(
"C:\Users\chwebb\Downloads\pp-2018.csv"
),
[Delimiter=",", Columns=16,
Encoding=1252, QuoteStyle=QuoteStyle.None]),
#"Changed Type" =
Table.TransformColumnTypes(
Source,
{{"Column2", Int64.Type}}
),
#"Sorted Rows" =
Table.Sort(
#"Changed Type",
{{"Column2", Order.Descending}}
),
Column2 =
#"Sorted Rows"{0}[Column2]
in
Column2

The performance of this query is much worse: 75 seconds, although this varies a lot. So I tried to work out what’s going on here and see if I could improve performance… and that’s when I started learning.

The variation in the amount of time taken to run the query made me think about memory usage and the 256MB container size limit (see this blog post for important background information) and sure enough, Resource Monitor showed that this query was hitting the 256MB limit – unsurprising because sorting is one of those transformations that requires a table to be loaded into memory completely (at least in the worst case – and this seemed like the worst case). Why not reduce the size of the table then? Since only Column2 is needed by the query output I removed all other columns in the table before doing the sort, resulting in the following M:

let
Source =
Csv.Document(
File.Contents(
"C:\Users\chwebb\Downloads\pp-2018.csv"
),
[Delimiter=",", Columns=16,
Encoding=1252, QuoteStyle=QuoteStyle.None]),
#"Changed Type" =
Table.TransformColumnTypes(
Source,
{{"Column2", Int64.Type}}
),
#"Removed Other Columns" =
Table.SelectColumns(
#"Changed Type",
{"Column2"}
),
#"Sorted Rows" =
Table.Sort(
#"Removed Other Columns",
{{"Column2", Order.Descending}}
),
Column2 =
#"Sorted Rows"{0}[Column2]
in
Column2

This reduced query execution time a lot – it still varied, but now it was in the range of 5 to 8 seconds.

This leads to the first important performance tuning tip: remove all unnecessary columns from your tables to reduce memory overhead, especially if you’re doing memory-intensive transformations such as sorts, merges, groupings, pivots or unpivots. Of course you should be removing all columns you don’t need for your reports anyway, but the point here is that:

  • You should remove columns you don’t need in your dataset as early as possible in your queries
  • In situations where you need to do more complex transformations in intermediate queries (ie queries that don’t load direct into the dataset but whose output is used by queries that do), remove all but the columns needed by the transformation

Tune in next week for part 2, where I’ll show even more examples of how this particular query can be tuned – and I promise that all you Power Query fans will learn something new!

Adding More Aggregate Columns To The Output Of Table.Profile

A few years ago I blogged about the Table.Profile M function and how you could use it to create a table of descriptive statistics for your data:

https://blog.crossjoin.co.uk/2016/01/12/descriptive-statistics-in-power-bim-with-table-profile/

Since that post was written a new, optional second parameter has been added to the function called additionalAggregates which allows you to add your own custom columns containing aggregate values to the output of Table.Profile, so I thought I’d write a follow-up on how to use it.

Consider the following query:

[sourcecode language='text'  padlinenumbers='true']
let
    Source = 
    #table(
        type table[Month = text, Sales = number],
        {
            {"Jan",1},
            {"Feb",5},
            {"Mar",17},
            {"Apr",1}
        }
    ),
    Profile = 
    Table.Profile(
        Source
    )
in
    Profile
[/sourcecode]

 

There are two steps here. Source returns the following table using the #table function:

image

The second step uses Table.Profile to return a table of summary statistics:

image

[Several columns returned aren’t shown in the screenshot above]

If you want some extra columns on this table you can now add them with the second additionalAggregates parameter of Table.Profile like so:

[sourcecode language='text'  htmlscript='false' highlight='15,16,17,18']
let
    Source = 
    #table(
        type table[Month = text, Sales = number],
        {
            {"Jan",1},
            {"Feb",5},
            {"Mar",17},
            {"Apr",1}
        }
    ),
    Profile = 
    Table.Profile(
        Source,
        {
        {"Median", each Type.Is(_, type number), List.Median},
        {"Mode", each Type.Is(_, type number), List.Mode}
        }
    )
in
    Profile
[/sourcecode]

In this example I’ve added two extra columns called Median and Mode, which return the median and mode values of every numeric column:

image

The additionalAggregates parameter takes a list, where each item is itself a list of three items:

  • The name of the new column to add.
  • A function to call to determine where the aggregate column will be applied to a given column in the source table. In this case I’m checking to see if a column is type number and only returning a median or mode if it is.
  • A function to call to calculate the aggregate value across the column: in this case List.Median and List.Mode. It’s a good idea to use an aggregate function that does not require storing every value in the column in memory (ideally one that can be folded, if you’re using a data source where query folding is supported) here to ensure fast performance.

[Thanks to Curt Hagenlocher for the information here]

Excel Dynamic Arrays And FilterXML

I’ll admit that I’m a bit less excited about Excel dynamic arrays than I was when I last blogged about them. Don’t get me wrong: from a pure Excel point-of-view they are still very cool, but I’ve since found out that the CubeValue function can’t be used with dynamic arrays which prevents me from doing all the really fun SSAS/Power BI/Power Pivot things I wanted to do with cube formulas.

It’s not all doom and gloom though. Several years ago I blogged about the then-new WebService and FilterXML functions (here and here). I very quickly found that the WebService function was very limited indeed and that Power Query did everything it did but better; on the other hand I felt FilterXML I had some unexplored potential, although I never got round to doing any exploring. Fast forward to last year and I saw that FilterXML was one of a number of existing functions that are affected by the new dynamic array behaviour, a change that makes it even more flexible.

Consider the following, publicly-available XML file:

http://www.hmrc.gov.uk/softwaredevelopers/rates/exrates-monthly-0719.xml

It’s a list of exchange rates published by the UK tax authorities and the contents look like this:

image

Just the kind of data you want to bring into Excel. The WebService function doesn’t work with this URL for some reason but it’s very easy to bring this data into an Excel table using Power Query with a few clicks using the Web data source:

image

If you prefer to work with dynamic arrays though (and I’m sure there are lots of reasons why that might be true), you can do that with a combination of Power Query and FilterXML.

The first thing to do is to use Power Query to load the entire XML document into a single cell in a worksheet. Here’s a query to do this:

[sourcecode language='text'  padlinenumbers='true']
let
    Source = 
    Text.FromBinary(
        Web.Contents(
            "http://www.hmrc.gov.uk/softwaredevelopers" &
            "/rates/exrates-monthly-0719.xml"
        )
    ),
    #"Converted to Table" = 
    #table(
        1, 
        {{Source}}
    ),
    #"Cleaned Text" = 
    Table.TransformColumns(
        #"Converted to Table",
        {{"Column1", Text.Clean, type text}}
    ),
    #"Replaced Value" = 
    Table.ReplaceValue(
        #"Cleaned Text",
        "> <",
        "><",
        Replacer.ReplaceText,{"Column1"}
    ),
    #"Replaced Value1" = 
    Table.ReplaceValue(
        #"Replaced Value",
        ">  <",
        "><",
        Replacer.ReplaceText,{"Column1"}
    ),
    #"Replaced Value2" = 
    Table.ReplaceValue(
        #"Replaced Value1",
        ">   <",
        "><",
        Replacer.ReplaceText,
        {"Column1"}
    ),
    #"Replaced Value3" = 
    Table.ReplaceValue(
        #"Replaced Value2",
        ">    <",
        "><",
        Replacer.ReplaceText,{"Column1"}
    )
in
    #"Replaced Value3"
[/sourcecode]

One interesting point to make here: the FilterXML function does not like spaces between closing and opening angle brackets in XML (maybe this is why WebService errors too?) so I’m removing all occurrences of this, as well as removing any unprintable characters. Here’s the output, a table with one column and one row where the only cell contains the full XML:

image

You can then use an Excel formula like this to run an XPath query against this XML document:

[sourcecode language='text' ]
=FILTERXML($A$2, "/exchangeRateMonthList/exchangeRate/countryName")
[/sourcecode]

to get a list of all the country names spilling out to as many rows in the worksheet as necessary:

3d266a16-fa27-4927-9f79-558be7891293

[If you want a comparison with how FilterXML used to work in all its CTRL+SHIFT+ENTER glory see the “Scraping a whole XML document” section here; if you want to learn XPath, the query language used by FilterXML, there is a good tutorial here]

I’m not an XPath expert, or even an Excel expert, so I’ll finish here but hopefully this will prove useful to someone. You can download an Excel workbook containing the demos from this post here – note that neither Power Query nor dynamic arrays work in Excel Online yet, so don’t look at the workbook in the browser.

SSASDiag: A Tool To Help Diagnose Analysis Services Problems

There are a lot of great community-developed tools out there for Analysis Services developers to use (BI Developer Extensions, DAX Studio, Tabular Editor, Analysis Services Query Analyzer to name a few) and they have saved me vast amounts of time and effort over the years. When I joined Microsoft last month I came across one which I had never seen before but which is nevertheless quite mature and feature-rich: the SSAS Diagnostics Tool or SSASDiag for short. It’s available on GitHub here:
https://github.com/ssasdiag/SSASDiag

…and you can read the documentation here:
https://github.com/ssasdiag/SSASDiag/wiki/SSAS-Diagnostics—Analysis

image

It’s an open source tool developed by the people who support Analysis Services here at Microsoft and is intended to help them collect and analyse the information they need to troubleshoot on-premises SSAS  issues, but it’s available for anyone to use. I haven’t had a chance to take a proper look at it yet myself, unfortunately, but I thought it would be interesting for any SSAS fans out there to check out.

[Thanks to Jon Burchel for providing all the background information for this post]