Excel Cube Functions, Dynamic Arrays And Lambdas, Part 1: Getting All The Items From A Set

After my recent post on using Office Scripts and cube functions to generate Excel reports from Power BI data, Meagan Longoria asked me this question on Twitter:

To which I can only reply: guilty as charged. I have always loved the Excel cube functions even though they are maybe the least appreciated, least known and least used feature in the whole Microsoft BI stack. They have their issues (including sometimes performance) but they are great for building certain types of report in Excel that can’t be built any other way.

Anyway, the recent addition of new Lambda helper functions to Excel has made me particularly happy because they can be used with cube functions to overcome some limitations that have existed since cube functions were first released in Excel 2007, and to do some other cool things too. In this series of posts I’m going to explore some of the things they make possible.

Let’s start with something simple. In Excel, the CubeSet function can be used to return an (MDX) set of items. This set is stored in a single cell, though, and to extract each item into a cell on your worksheet you need to use the CubeRankedMember function. For example, let’s say I have a table called Sales on my worksheet:

…that is then loaded into the Excel Data Model (aka Power Pivot – although this works exactly the same if I use a Power BI dataset, Azure Analysis Services or SQL Server Analysis Services as my source):

What you can then do is use the CubeSet function to create a set of all the products like so:

=CUBESET("ThisWorkbookDataModel", "[Sales].[Product].[Product].MEMBERS", "Product Set")

…and then use the CubeRankedMember function to put each individual item of the set into a cell. Here’s a simple example worksheet, first with the formulas showing and then the results:

This example shows the fundamental problem that has always existed with CubeRankedMember though: in order to show all the items in a set you need to know how many items there are in advance, and populate as many cells with CubeRankedMember formulas as there are items. In this case see how the range B4:B6 contains the numbers 1, 2 and 3; these numbers are used in the formulas in the range C4:C6 to get the first, second and third items in the set.

If a fourth product was added to the table, however, it would not appear automatically – you would have to add another cell with another CubeRankedMember formula in it manually. I’ve seen some workarounds but they’re a bit hacky and require you to know what the maximum possible number of items in a set could ever be. Indeed that’s always been one of the key differences between cube functions and PivotTables: cube functions are static whereas PivotTables can grow and shrink dynamically when the data changes.

The new MakeArray function in Excel provides a really elegant solution to this problem: you can now write a single formula that returns a dynamic array with all the items in the set in. Assuming that the same CubeSet exists in B2 as shown above, you can do the following:

=MAKEARRAY(CUBESETCOUNT($B$2), 1, LAMBDA(r,c,CUBERANKEDMEMBER("ThisWorkbookDataModel",Sheet3!$B$2,r)))

Here’s the output:

Notice how the formulas in cell B4 returns an array that contains all three items in the set into the range B4:B6.

How does this work?

  1. The CubeSetCount function is used to get the number of items in the CubeSet in B2.
  2. The MakeArray function is then used to create an array with the number of rows returned by CubeSetCount and one column
  3. In the third parameter of MakeArray the Lambda function is used to return a function that wraps CubeRankedMember, which is then called with the current row number of the array

The nice thing about this is that when more products are added to the Sales table they automatically appear in the output of the MakeArray formula in B4. So, for example, with two more products added to the Sales table like so:

Here’s the new output of the formula, showing the two new products returned in the array automatically:

This is not very useful on its own though. In my next post I’ll show you how this can be used to build a simple report.

Query Pending Event In Power BI Performance Analyzer

A quick note for anyone like me who spends too much time looking at the JSON exports from Performance Analyzer in Power BI Desktop: you may have noticed an event called Query Pending that isn’t (as yet) documented in the Word doc that explains the format of these JSON files.

It turns out that it’s not that interesting – it’s an event that has been added as part of an effort to make sure there are events to cover the whole of the query lifecycle. After the DAX queries for each visual in your report are generated they are added to a queue before they are executed. In some cases there could be several queries in the queue waiting to be executed, in which case they are said to be “pending”, and the Query Pending event tells you how long a query is in this pending state.

I haven’t seen a duration of longer than a couple of milliseconds for this event though, so you probably don’t need to worry much about it. If you ever do see a long Query Pending event please leave a comment – I’m curious to know what the cause might be.

[Thanks to John Vulner and Jon Ludwig for this information]

Generating Excel Reports Using Power BI, Cube Formulas, Office Scripts And Power Automate

Now that Excel reports connected to Power BI datasets work in Excel Online it opens up a lot of new possibilities for doing cool things with Office Scripts and Power Automate. Here’s a simple example showing how all these technologies can be put together to automatically generate batches of Excel reports from a template.

Step 1: Create a template report in Excel using cube formulas

In Excel on the desktop I created a new Excel file, created a connection to a Power BI dataset and then built a simple report using Excel cube formulas:

Here are the Excel formulas for the table on the left:

This report uses data from the UK’s Land Registry (one of my favourite data sources) and shows the average price paid and number of sales broken down by property type for a single county (specified in cell B2 of this report – in the screenshot above data for Bournemouth is shown). Here’s the formula in B2:

=CUBEMEMBER("Price Paid", "[Property Transactions].[County].[All].[BOURNEMOUTH]")

This formula is referenced by all the CUBEVALUE formulas in the body of the table so they are all sliced by the selected county.

After doing this, I saved the file to OneDrive for Business.

Step 2: Create an Office Script to change the county shown in cell B2

The aim of this exercise is to generate one copy of the report above for each county in a list of counties, so the next thing I did was create a parameterised Office Script that takes the name of a county and changes the county name used in the formula in cell B2. To do this I opened the Excel report in Excel Online, started the script recorder, changed the formula in B2 and then stopped recording. I then edited this script to take a parameter for the county name (called county) to use in the formula. Here’s the script:

function main(workbook: ExcelScript.Workbook, county: string) {
	let selectedSheet = workbook.getActiveWorksheet();
	// Set range B2 on selectedSheet
	selectedSheet.getRange("B2").setFormulaLocal("=CUBEMEMBER(\"Price Paid\", \"[Property Transactions].[County].[All].[" + county + "]\")");
}

Step 3: Create a list of counties to pass to the script

Next, I created a second Excel workbook containing a table that contained the county names to pass to the script and saved this to OneDrive for Business too:

Step 4: Create Power Automate flow to call the script once for each county in the Excel table

Last of all, I created a Power Automate flow that reads the county names from the table in the previous step, runs the script for each county, creates a copy of the original Excel report after each script run and then saves it to a folder. Here’s the flow at a high level:

In more detail, here’s the setup for the ‘List rows present in a table’ action:

Here’s the ‘Run script’ action:

Here’s the expression used to get the current county name in the loop:

items('CountyLoop')?['Counties']

…and here’s the expression used to create the destination file path:

concat('/Office Scripts Reporting/Reports/',items('CountyLoop')?['Counties'],'.xlsx')

Running this flow results in three Excel workbooks being created, one for each county with the county name in the workbook name, stored in a folder like so:

Here’s the report in BATH AND NORTH EAST SOMERSET.xlsx:

Of course I could do other things at this point like email these workbooks to different people, but there’s no need to overcomplicate things – I hope you’ve got the idea.

A few last points to make:

  • Office Scripts don’t seem to work with PivotTables connected to Power BI datasets yet – I’m sure it’s just a matter of time before they do though
  • How is this different from using Power Automate to call the Power BI export API? A paginated report can be exported to Excel but this method gives you a lot more flexibility because it allows you to use a lot more Excel functionality, not jus the functionality that paginated reports can use in its exports. It also gives you a report that is connected live back to a dataset using cube functions, not static data.
  • Generating large numbers of Excel reports like this is not something I like to encourage – why not view your report in the Power BI portal, especially now you can view live Excel reports connected to datasets there too? – but I know it’s something that customers ask for .
  • I haven’t done any performance testing but I suspect that this method may be faster than using the Power BI export API in Power Automate.

Power BI Dataset Refresh, Column Encoding And The First Partition

If you’ve been following some of my recent posts about improving Power BI refresh performance by partitioning tables you will have seen a lot of screenshots that look like the one below:

It’s a visualisation from a report created by my colleague Phil Seamark (as detailed in this blog post) showing how long all the partitions in a dataset take to refresh. If you look at these visualisations you’ll probably ask the same question I did: why does the first partition always start before the others?

It turns out this is because when a table is refreshed, the first thing that has to happen is that a certain amount of data is read so the type of encoding (Value or Hash) used for each column is determined. In most cases tables only contain one partition so it’s not obvious that this is happening, but when a table has more than one partition this happens only for the first partition – which explains why the first partition seems to start before the others. You can’t avoid it happening but you can reduce the impact a little by using encoding hints (see here and here for more details): this process can be skipped for columns that have a Hash encoding hint, or which the engine knows in advance have to use Hash encoding, although it cannot be skipped for columns that have a Value encoding hint. What’s more the Execute SQL event for the first partition will have to complete before the Execute SQL events for all the other partitions can start.

[Thanks to Akshai Mirchandani for the information in this post]

Dynamic M Parameters, Snowflake Native SQL And Paginated Reports

There were a couple of new features and enhancements to existing features in the June 2021 Power BI Desktop release that don’t seem to have much to do with each other but which I think can be combined to do cool things. They are:

  1. The new paginated report visual
  2. Native SQL support in the Snowflake connector
  3. Improvements to dynamic M parameters

Let me give you an example of what I mean…

First of all, let’s start with native SQL support in the Snowflake connector. I deal with a lot of customers who use Snowflake and Power BI together and I know just how much people have wanted this. What does it allow you to do? Well, you have always been able to use the Power Query Editor to transform data coming from Snowflake in either Import mode or DirectQuery mode. Now, though, you can write your own native SQL query and use it as the source for a Power Query query (something that has always been possible with some other connectors, such as the SQL Server connector). Incidentally, this also means that the EnableFolding=true option for Value.NativeQuery that I blogged about recently also now works for Snowflake too.

The main reason you’d want to use a native SQL query when connecting to Snowflake, or indeed any database, is to do something that’s possible in SQL but not in Power Query. One example of this is to use regular expressions to filter data. I have the AdventureWorks DW DimCustomer table loaded into Snowflake and I can use Snowflake’s REGEXP function to filter on the LASTNAME column something like this:

SELECT 
DISTINCT FIRSTNAME, LASTNAME, ENGLISHOCCUPATION 
FROM "AWORKS"."PUBLIC"."DIMCUSTOMER" 
WHERE LASTNAME REGEXP 'To.*'

So that’s useful. I can use a query like this as the source of a table in DirectQuery mode in Power BI, but wouldn’t it be useful if end users of my report could change the regular expression used to filter the data? This is where dynamic M parameters come in. Assuming I have a table of pre-defined regular expressions:

And an M parameter:

…I can write an M query like this that uses the M parameter to return the regular expression used in the WHERE clause of the SQL query:

let
  Source = Value.NativeQuery(
    Snowflake.Databases(
      "mysnowflake.com", 
      "DEMO_WH"
    ){[Name = "AWORKS"]}[Data], 
    "SELECT DISTINCT FIRSTNAME, LASTNAME, ENGLISHOCCUPATION 
    FROM ""AWORKS"".""PUBLIC"".""DIMCUSTOMER"" 
    WHERE LASTNAME REGEXP '"
      & pRegEx
      & "'", 
    null, 
    [EnableFolding = true]
  )
in
  Source

…and then turn this into a dynamic M parameter in the Power BI diagram pane:

…and get a report that does this:

One limitation of dynamic M parameters in regular Power BI reports today is that the values you pass into them have to come from a column somewhere inside your dataset, so all of these values have to be pre-defined. Wouldn’t it be useful if the end user could enter any regular expression that they wanted though? That may not be possible in a regular Power BI report but it is possible with a paginated report, because with paginated reports you can write whatever DAX query you want – and therefore pass any value you want to a dynamic M parameter – and also, in a paginated report, you have the option of creating parameters where the user can enter whatever value they want.

I blogged about how to write DAX queries that contain dynamic M parameters here. Here’s an example of a parameterised DAX query (yes, I know, so many types of parameters…) that takes a regular expression and the name of an occupation and returns a table of customers whose last names match the regular expression and whose occupations match the one entered:

DEFINE
    MPARAMETER pRegEx = @DAXRegExParam
EVALUATE
FILTER (
    Customers,
    'Customers'[ENGLISHOCCUPATION] = @DAXOccupationParam
)

This can be used in a paginated report dataset connected to the Power BI dataset created above (yes, I know, so many types of datasets…) like so:

….which can then be used to build a paginated report that does this:

And of course, with the new paginated report visual, this paginated report can be embedded in a regular Power BI report:

All this is very much a proof-of-concept and not something I would recommend for production (I would be worried about SQL injection attacks for a start). There are more enhancements to these features still to come too. However, I do think it’s interesting to see how these features can be put together now and to imagine how they could be used in the future. What do you think?

%d bloggers like this: