I’ll admit that I’m a bit less excited about Excel dynamic arrays than I was when I last blogged about them. Don’t get me wrong: from a pure Excel point-of-view they are still very cool, but I’ve since found out that the CubeValue function can’t be used with dynamic arrays which prevents me from doing all the really fun SSAS/Power BI/Power Pivot things I wanted to do with cube formulas.
It’s not all doom and gloom though. Several years ago I blogged about the then-new WebService and FilterXML functions (here and here). I very quickly found that the WebService function was very limited indeed and that Power Query did everything it did but better; on the other hand I felt FilterXML I had some unexplored potential, although I never got round to doing any exploring. Fast forward to last year and I saw that FilterXML was one of a number of existing functions that are affected by the new dynamic array behaviour, a change that makes it even more flexible.
Consider the following, publicly-available XML file:
http://www.hmrc.gov.uk/softwaredevelopers/rates/exrates-monthly-0719.xml
It’s a list of exchange rates published by the UK tax authorities and the contents look like this:
Just the kind of data you want to bring into Excel. The WebService function doesn’t work with this URL for some reason but it’s very easy to bring this data into an Excel table using Power Query with a few clicks using the Web data source:
If you prefer to work with dynamic arrays though (and I’m sure there are lots of reasons why that might be true), you can do that with a combination of Power Query and FilterXML.
The first thing to do is to use Power Query to load the entire XML document into a single cell in a worksheet. Here’s a query to do this:
[sourcecode language='text' padlinenumbers='true'] let Source = Text.FromBinary( Web.Contents( "http://www.hmrc.gov.uk/softwaredevelopers" & "/rates/exrates-monthly-0719.xml" ) ), #"Converted to Table" = #table( 1, {{Source}} ), #"Cleaned Text" = Table.TransformColumns( #"Converted to Table", {{"Column1", Text.Clean, type text}} ), #"Replaced Value" = Table.ReplaceValue( #"Cleaned Text", "> <", "><", Replacer.ReplaceText,{"Column1"} ), #"Replaced Value1" = Table.ReplaceValue( #"Replaced Value", "> <", "><", Replacer.ReplaceText,{"Column1"} ), #"Replaced Value2" = Table.ReplaceValue( #"Replaced Value1", "> <", "><", Replacer.ReplaceText, {"Column1"} ), #"Replaced Value3" = Table.ReplaceValue( #"Replaced Value2", "> <", "><", Replacer.ReplaceText,{"Column1"} ) in #"Replaced Value3" [/sourcecode]
One interesting point to make here: the FilterXML function does not like spaces between closing and opening angle brackets in XML (maybe this is why WebService errors too?) so I’m removing all occurrences of this, as well as removing any unprintable characters. Here’s the output, a table with one column and one row where the only cell contains the full XML:
You can then use an Excel formula like this to run an XPath query against this XML document:
[sourcecode language='text' ] =FILTERXML($A$2, "/exchangeRateMonthList/exchangeRate/countryName") [/sourcecode]
to get a list of all the country names spilling out to as many rows in the worksheet as necessary:
[If you want a comparison with how FilterXML used to work in all its CTRL+SHIFT+ENTER glory see the “Scraping a whole XML document” section here; if you want to learn XPath, the query language used by FilterXML, there is a good tutorial here]
I’m not an XPath expert, or even an Excel expert, so I’ll finish here but hopefully this will prove useful to someone. You can download an Excel workbook containing the demos from this post here – note that neither Power Query nor dynamic arrays work in Excel Online yet, so don’t look at the workbook in the browser.
= Xml.Tables(Web.Contents(“https://www.hmrc.gov.uk/softwaredevelopers/rates/exrates-monthly-0719.xml”))[exchangeRate]{0}
Dynamic Array Formulas are gold. They will revolutionize the way people work with formulas in Excel
Look at the attached file – Formula ETL.xlsx
https://1drv.ms/u/s!AiKBTsYfZw-vgrtkdDXnLzxgimxokg?e=UlondZ
You can now do ETL through formulas
If “D” is the name of a Table Structured Reference in Excel then
the below formula does the following
Remove Columns, Rearrange Columns, Filters the Data on a Complex criterion and sorts the data by the first 4 columns of the table.
=SORT(INDEX(FD,SEQUENCE(ROWS(FD)),ColOrder),SortOrder)
Where
FD = FILTER(D,COUNTIFS(MONTH,D[MONTH])*COUNTIFS(REGION,D[REGION])*(D[BUDGETS]>BUDGETS))
and ColOrder = {3,4,5,15,16,7,8,9,10,11,12,13,14,17}
and SortOrder = {1,2,3,4}
In SQL it would have been
SELECT
`DATA$`.PRODUCT_TYPE, `DATA$`.PRODUCT_CATEGEORY, `DATA$`.PRODUCT, `DATA$`.REGION, `DATA$`.EMP, `DATA$`.CUSTOMER_TYPE, `DATA$`.INV_MONTH, `DATA$`.INV_YEAR, `DATA$`.MONTH, `DATA$`.HRS, `DATA$`.BUDGETS, `DATA$`.ACTUALS, `DATA$`.TYPE
FROM
`C:\DATA\FORMULA_ETL.xlsx`.`DATA$` `DATA$`
WHERE
(`DATA$`.MONTH=’JAN’) AND (`DATA$`.REGION=’NORTH’) AND (`DATA$`.BUDGETS>10000) OR (`DATA$`.MONTH=’FEB’) AND (`DATA$`.REGION=’NORTH’) AND (`DATA$`.BUDGETS>10000) OR (`DATA$`.MONTH=’JAN’) AND (`DATA$`.REGION=’SOUTH’) AND (`DATA$`.BUDGETS>10000) OR (`DATA$`.MONTH=’FEB’) AND (`DATA$`.REGION=’SOUTH’) AND (`DATA$`.BUDGETS>10000) OR (`DATA$`.MONTH=’MAR’) AND (`DATA$`.REGION=’NORTH’) AND (`DATA$`.BUDGETS>10000) OR (`DATA$`.MONTH=’MAR’) AND (`DATA$`.REGION=’SOUTH’) AND (`DATA$`.BUDGETS>10000)
ORDER BY
`DATA$`.PRODUCT_TYPE, `DATA$`.PRODUCT_CATEGEORY, `DATA$`.PRODUCT, `DATA$`.REGION
in Power Query, this would have been
let
Source = Excel.CurrentWorkbook(){[Name=”D”]}[Content],
mFilter = Table.SelectRows(Source, each ([MONTH] = “JAN” or [MONTH] = “FEB” or [MONTH] = “MAR”) and ([REGION]=”NORTH” or [REGION]=”SOUTH”) and ([BUDGETS]>10000)),
mRemCols = Table.RemoveColumns(mFilter,{“INV_NO”, “CLIENT”, “INV_DATE”, “AGE”}),
mReOrdCols = Table.ReorderColumns(mRemCols,{“PRODUCT_TYPE”, “PRODUCT_CATEGEORY”, “PRODUCT”, “EMP”, “REGION”, “CUSTOMER_TYPE”, “INV_MONTH”, “INV_YEAR”, “MONTH”, “HRS”, “BUDGETS”, “ACTUALS”, “TYPE”}),
mSort = Table.Sort(mReOrdCols,{{“PRODUCT_TYPE”, Order.Ascending}, {“PRODUCT_CATEGEORY”, Order.Ascending}, {“PRODUCT”, Order.Ascending}, {“EMP”, Order.Ascending}})
in
mSort