Excel Dynamic Arrays And FilterXML

I’ll admit that I’m a bit less excited about Excel dynamic arrays than I was when I last blogged about them. Don’t get me wrong: from a pure Excel point-of-view they are still very cool, but I’ve since found out that the CubeValue function can’t be used with dynamic arrays which prevents me from doing all the really fun SSAS/Power BI/Power Pivot things I wanted to do with cube formulas.

It’s not all doom and gloom though. Several years ago I blogged about the then-new WebService and FilterXML functions (here and here). I very quickly found that the WebService function was very limited indeed and that Power Query did everything it did but better; on the other hand I felt FilterXML I had some unexplored potential, although I never got round to doing any exploring. Fast forward to last year and I saw that FilterXML was one of a number of existing functions that are affected by the new dynamic array behaviour, a change that makes it even more flexible.

Consider the following, publicly-available XML file:

http://www.hmrc.gov.uk/softwaredevelopers/rates/exrates-monthly-0719.xml

It’s a list of exchange rates published by the UK tax authorities and the contents look like this:

image

Just the kind of data you want to bring into Excel. The WebService function doesn’t work with this URL for some reason but it’s very easy to bring this data into an Excel table using Power Query with a few clicks using the Web data source:

If you prefer to work with dynamic arrays though (and I’m sure there are lots of reasons why that might be true), you can do that with a combination of Power Query and FilterXML.

The first thing to do is to use Power Query to load the entire XML document into a single cell in a worksheet. Here’s a query to do this:

[sourcecode language='text'  padlinenumbers='true']
let
    Source = 
    Text.FromBinary(
        Web.Contents(
            "http://www.hmrc.gov.uk/softwaredevelopers" &
            "/rates/exrates-monthly-0719.xml"
        )
    ),
    #"Converted to Table" = 
    #table(
        1, 
        {{Source}}
    ),
    #"Cleaned Text" = 
    Table.TransformColumns(
        #"Converted to Table",
        {{"Column1", Text.Clean, type text}}
    ),
    #"Replaced Value" = 
    Table.ReplaceValue(
        #"Cleaned Text",
        "> <",
        "><",
        Replacer.ReplaceText,{"Column1"}
    ),
    #"Replaced Value1" = 
    Table.ReplaceValue(
        #"Replaced Value",
        ">  <",
        "><",
        Replacer.ReplaceText,{"Column1"}
    ),
    #"Replaced Value2" = 
    Table.ReplaceValue(
        #"Replaced Value1",
        ">   <",
        "><",
        Replacer.ReplaceText,
        {"Column1"}
    ),
    #"Replaced Value3" = 
    Table.ReplaceValue(
        #"Replaced Value2",
        ">    <",
        "><",
        Replacer.ReplaceText,{"Column1"}
    )
in
    #"Replaced Value3"
[/sourcecode]

One interesting point to make here: the FilterXML function does not like spaces between closing and opening angle brackets in XML (maybe this is why WebService errors too?) so I’m removing all occurrences of this, as well as removing any unprintable characters. Here’s the output, a table with one column and one row where the only cell contains the full XML:

You can then use an Excel formula like this to run an XPath query against this XML document:

[sourcecode language='text' ]
=FILTERXML($A$2, "/exchangeRateMonthList/exchangeRate/countryName")
[/sourcecode]

to get a list of all the country names spilling out to as many rows in the worksheet as necessary:

[If you want a comparison with how FilterXML used to work in all its CTRL+SHIFT+ENTER glory see the “Scraping a whole XML document” section here; if you want to learn XPath, the query language used by FilterXML, there is a good tutorial here]

I’m not an XPath expert, or even an Excel expert, so I’ll finish here but hopefully this will prove useful to someone. You can download an Excel workbook containing the demos from this post here – note that neither Power Query nor dynamic arrays work in Excel Online yet, so don’t look at the workbook in the browser.

2 thoughts on “Excel Dynamic Arrays And FilterXML

  1. Dynamic Array Formulas are gold. They will revolutionize the way people work with formulas in Excel

    Look at the attached file – Formula ETL.xlsx

    https://1drv.ms/u/s!AiKBTsYfZw-vgrtkdDXnLzxgimxokg?e=UlondZ

    You can now do ETL through formulas
    If “D” is the name of a Table Structured Reference in Excel then
    the below formula does the following
    Remove Columns, Rearrange Columns, Filters the Data on a Complex criterion and sorts the data by the first 4 columns of the table.

    =SORT(INDEX(FD,SEQUENCE(ROWS(FD)),ColOrder),SortOrder)

    Where
    FD = FILTER(D,COUNTIFS(MONTH,D[MONTH])*COUNTIFS(REGION,D[REGION])*(D[BUDGETS]>BUDGETS))
    and ColOrder = {3,4,5,15,16,7,8,9,10,11,12,13,14,17}
    and SortOrder = {1,2,3,4}

    In SQL it would have been

    SELECT

    `DATA$`.PRODUCT_TYPE, `DATA$`.PRODUCT_CATEGEORY, `DATA$`.PRODUCT, `DATA$`.REGION, `DATA$`.EMP, `DATA$`.CUSTOMER_TYPE, `DATA$`.INV_MONTH, `DATA$`.INV_YEAR, `DATA$`.MONTH, `DATA$`.HRS, `DATA$`.BUDGETS, `DATA$`.ACTUALS, `DATA$`.TYPE

    FROM
    `C:\DATA\FORMULA_ETL.xlsx`.`DATA$` `DATA$`

    WHERE
    (`DATA$`.MONTH=’JAN’) AND (`DATA$`.REGION=’NORTH’) AND (`DATA$`.BUDGETS>10000) OR (`DATA$`.MONTH=’FEB’) AND (`DATA$`.REGION=’NORTH’) AND (`DATA$`.BUDGETS>10000) OR (`DATA$`.MONTH=’JAN’) AND (`DATA$`.REGION=’SOUTH’) AND (`DATA$`.BUDGETS>10000) OR (`DATA$`.MONTH=’FEB’) AND (`DATA$`.REGION=’SOUTH’) AND (`DATA$`.BUDGETS>10000) OR (`DATA$`.MONTH=’MAR’) AND (`DATA$`.REGION=’NORTH’) AND (`DATA$`.BUDGETS>10000) OR (`DATA$`.MONTH=’MAR’) AND (`DATA$`.REGION=’SOUTH’) AND (`DATA$`.BUDGETS>10000)

    ORDER BY
    `DATA$`.PRODUCT_TYPE, `DATA$`.PRODUCT_CATEGEORY, `DATA$`.PRODUCT, `DATA$`.REGION

    in Power Query, this would have been

    let
    Source = Excel.CurrentWorkbook(){[Name=”D”]}[Content],
    mFilter = Table.SelectRows(Source, each ([MONTH] = “JAN” or [MONTH] = “FEB” or [MONTH] = “MAR”) and ([REGION]=”NORTH” or [REGION]=”SOUTH”) and ([BUDGETS]>10000)),
    mRemCols = Table.RemoveColumns(mFilter,{“INV_NO”, “CLIENT”, “INV_DATE”, “AGE”}),
    mReOrdCols = Table.ReorderColumns(mRemCols,{“PRODUCT_TYPE”, “PRODUCT_CATEGEORY”, “PRODUCT”, “EMP”, “REGION”, “CUSTOMER_TYPE”, “INV_MONTH”, “INV_YEAR”, “MONTH”, “HRS”, “BUDGETS”, “ACTUALS”, “TYPE”}),
    mSort = Table.Sort(mReOrdCols,{{“PRODUCT_TYPE”, Order.Ascending}, {“PRODUCT_CATEGEORY”, Order.Ascending}, {“PRODUCT”, Order.Ascending}, {“EMP”, Order.Ascending}})
    in
    mSort

Leave a ReplyCancel reply