How Defining Too Many Measures In A Live Connection Report Can Affect Power BI Query Performance

You probably know that it’s a best practice to build your Power BI datasets in a separate .pbix file from your reports – among other things it means that different people can develop the dataset and reports. You may also know that if you are building a report in Power BI Desktop with a Live connection to a published dataset or Azure Analysis Services you can define your own measures inside the report. While this is very convenient, if you create too many measures there’s a price to pay in terms of query performance.

To illustrate this, let’s say you have a super-simple dataset published to the Power BI Service (or a database in Analysis Services Tabular or Azure Analysis Services) that contains one table with three rows in it, two columns and a simple measure:

If you open Power BI Desktop and create a Live connection to this dataset, you can create a new measure in the normal way and then use it in a table like so:

If you take a look at the DAX query that is generated by this table visual you’ll notice that the MyReportMeasure measure, defined in the report, is defined at the top of the query while the Sales Amount measure, defined in the dataset, is not:

DEFINE
    MEASURE 'Sales'[MyReportMeasure] = ( 
    [Sales Amount] + 1 
    )
    VAR __DS0Core =
        SUMMARIZECOLUMNS (
            ROLLUPADDISSUBTOTAL (
                'Sales'[Product],
                "IsGrandTotalRowTotal"
            ),
            "Sales_Amount", 'Sales'[Sales Amount],
            "MyReportMeasure", 'Sales'[MyReportMeasure]
        )
    VAR __DS0PrimaryWindowed =
        TOPN (
            502,
            __DS0Core,
            [IsGrandTotalRowTotal], 0,
            'Sales'[Product], 1
        )
EVALUATE
__DS0PrimaryWindowed
ORDER BY
    [IsGrandTotalRowTotal] DESC,
    'Sales'[Product]

Here’s what DAX Studio’s Server Timings shows about this query when it runs on a cold cache:

As you would expect it’s pretty quick, taking just 16ms.

In this example MyReportMeasure is something known as a query-scoped measure: it is created when the query runs and ceases to exist when the query finishes. The problem with this is that creating a query has some costs associated with it: for example, Power BI/Analysis Services needs to do some dependency analysis to find out what other measures it refers to, and the more other measures there are, the longer this takes.

To show the impact I generated the DAX definition of 3000 measures in Excel and pasted them into the DEFINE clause of the query above:

[NB this is not exactly what happens in the real world: only the measures you need for a query, and the measures that these measures depend on, are defined in the query but the dependendency analysis happens all the same]

Here’s what Server Timings showed for the same query – which, remember, does not actually used any of the 3000 measures that I added:

Now 3000 measures might seem excessive but I have seen people with that many: you could have 100 base measures and then 30 combinations of different KPIs (time intelligence calculations, financial calculations like actual vs forecast and so on). My advice would be to use calculation groups instead of creating so many measures, if you can – they will be a lot easier to develop and maintain, and for anyone developing a report to use. It’s also worth making clear that this problem only happens with query-scoped measures: no dependency analysis takes place at query time with measures defined on the dataset.

Also 1.5 seconds might not seem a big overhead but if you’re trying to squeeze all the performance you get out of a query, or trying to understand what’s contributing to the overall performance of your query, this is good to know about.

[Thanks to Jeffrey Wang for providing the information in this post]

Excel Cube Functions, Dynamic Arrays And Lambdas, Part 3: Grouping And Histograms

In the last post in this series I showed how you can use Excel’s new Lambda helper functions to return tables. In this post I’ll show you how you can use them to return a dynamic array of CubeSet functions which can be used to build a histogram and do the kind of ABC-type analysis that can be difficult to do in a regular Power BI report.

For the examples in this post I added some rows to the Excel Data Model table that I’m using to hold my source data:

The aim here is to put these products into an arbitrary number of groups, or buckets, based on their sales. To define these buckets I created another Excel table called Buckets that has three columns: the name of the bucket, and the lower bound and the upper bound of the sales amount that determines whether a product should fall into the bucket:

I then created two dyanmic array formulas using the new Map function. In cell G2 I added this formula:

=
MAP(
 Buckets[Bucket Name], 
 Buckets[Lower Bound], 
 Buckets[Upper Bound], 
 LAMBDA(
  n,
  l,
  u, 
  CUBESET(
   "ThisWorkbookDataModel", 
   "FILTER([Sales].[Product].[Product].MEMBERS, [Measures].[Sales Amount]>=" & l & 
   " AND [Measures].[Sales Amount]<=" & u & ")", 
   n)
  )
)

And in cell H2 I added this formula:

MAP(
 G2#, 
 LAMBDA(
  s, 
  IF(
   CUBESETCOUNT(s)>0, 
   CUBEVALUE(
    "ThisWorkbookDataModel", 
    s, 
    "[Measures].[Sales Amount]"),
   0)
  )
)

Here’s what these two formulas return:

The formula in G2 takes three arrays – the values from the three columns in the Buckets table – and then loops over the values in those columns and uses the CubeSet function to return a set of the Products whose sales are between the lower and upper bounds. Since there are two rows in the Buckets table, this formula returns two sets. The formula in H2 uses the CubeValue function to return the aggregated sales amount for each set.

Last of all I created a column chart bound to the values in G2 and H2. This was a bit tricky to do, but I found the answer in this video from Leila Gharani – you need to create names that return the contents of the ranges G2# and H2# and then use the names in the chart definitions.

The beauty of all this is what when you edit the ranges in the Buckets table in the top left of the worksheet, edit the names of the buckets or add new buckets, the table and chart update automatically.

After doing all this I realised there was another, probably easier way to achieve the same result without using the Map function. All I needed to do was to add new calculated columns to the bucket table to return the sets and values:

Here’s the formula for the Set column in the table above:

=CUBESET(
"ThisWorkbookDataModel", 
"FILTER([Sales].[Product].[Product].MEMBERS, [Measures].[Sales Amount]>=" & 
[@[Lower Bound]] & 
"AND  [Measures].[Sales Amount]<=" & 
[@[Upper Bound]] & 
")", 
[@[Bucket Name]] & 
" set"
)

…and here’s the formula for the Sales column in that table:

= IF(
CUBESETCOUNT(
[@Set])>0, 
CUBEVALUE(
"ThisWorkbookDataModel", 
[@Set], 
"[Measures].[Sales Amount]"
),
0
)

I think this second approach should work with any version of Excel since the introduction of tables and cube formulas.

Excel Cube Functions, Dynamic Arrays And Lambdas, Part 2: Returning Tables

In the first post in this series I showed how to use the new Excel Lambda helper functions to return an array containing all the items in a set. That isn’t very useful on its own, so in this post I’ll show you how to generate an entire dynamic table using Excel cube functions and Lambda helper functions.

In this post I’ll be using the same source data as in my previous post: a table containing sales data with just two columns.

With this table added to the Excel Data Model/Power Pivot, I created two measures:

I then created created two sets using CubeSet containing the sets of Products (in cell B2 of my worksheet) and Measures (in cell B4) to use in my table:

=CUBESET("ThisWorkbookDataModel", "[Sales].[Product].[Product].MEMBERS", "Product Set")

=CUBESET("ThisWorkbookDataModel", "{[Measures].[Sales Amount], [Measures].[Forecast Sales]}", "Measure Set")

Here are the formulas shown in the worksheet:

And here’s the output – remember you only see the text in the third parameter displayed in the cell:

Now, here’s the fun part – a single formula that takes these sets and builds a table with the Measures on columns and the Products on rows:

=MAKEARRAY(
  CUBESETCOUNT(B2)+1,
  CUBESETCOUNT(B4)+1,
  LAMBDA(r,c,
   SWITCH(
    TRUE(),
    AND(r=1,c=1),
    "",
    c=1,
    CUBERANKEDMEMBER("ThisWorkbookDataModel",$B$2,r-1),
    r=1,
    CUBERANKEDMEMBER("ThisWorkbookDataModel",$B$4,c-1),
    CUBEVALUE("ThisWorkbookDataModel",
     CUBERANKEDMEMBER("ThisWorkbookDataModel",$B$2,r1),
     CUBERANKEDMEMBER("ThisWorkbookDataModel",$B$4,c-1)
    )
   )
  )
)

Here’s what this formula returns:

How does this work? Going through the MakeArray function step-by-step:

  • The first two parameters specify that the output will be an array with one more row than there are items in the Product set and one more column than there are items in the Measures set.
  • The third parameter returns a Lambda that is called for every cell in this array. This Lambda contains a Switch with the following conditions:
    • For the top-left cell in the array, return a blank value
    • In the first column, use the CubeRankedMember function to return the Products on the rows of the table
    • In the first row, use the CubeRankedMember function to return the Measures on the columns of the table
    • In the body of the table, use the CubeValue function to return the values

Here’s a slightly more ambitious version that returns the same table but adds a total row to the bottom:

=
LET(
 NumberOfRows,
 CUBESETCOUNT(B2)+2,
 NumberOfColumns,
 CUBESETCOUNT(B4)+1,
 MAKEARRAY(
  NumberOfRows,
  NumberOfColumns,
  LAMBDA(r,c,
   SWITCH(
    TRUE(),
    AND(r=1,c=1),
    "",
    AND(r=NumberOfRows,c=1),
    "Total",
    r=NumberOfRows,
    CUBEVALUE("ThisWorkbookDataModel",
     $B$2,
     CUBERANKEDMEMBER("ThisWorkbookDataModel",$B$4,c-1)),
    c=1,
    CUBERANKEDMEMBER("ThisWorkbookDataModel",$B$2,r-1),
    r=1,
    CUBERANKEDMEMBER("ThisWorkbookDataModel",$B$4,c-1),
    CUBEVALUE("ThisWorkbookDataModel",
     CUBERANKEDMEMBER("ThisWorkbookDataModel",$B$2,r-1),
     CUBERANKEDMEMBER("ThisWorkbookDataModel",$B$4,c-1))
    )
   )
  )
)

Two extra things to note here:

  • This is a great example of a complex formula where the new Excel Let function can be used to improve readability and prevent the same value being evaluated twice.
  • The values in the Total row are calculated in the Excel Data Model, not on the worksheet, by using the CubeSet function inside the CubeValue function. This means that the totals will be consistent with what you see in a PivotTable and therefore correct

This is still very much a proof-of-concept. I need to look at the performance of this approach (it may not be optimal and may need tuning), and I’m not sure how a table like this could be formatted dynamically (especially the Total row). It is exciting though!

Excel Cube Functions, Dynamic Arrays And Lambdas, Part 1: Getting All The Items From A Set

After my recent post on using Office Scripts and cube functions to generate Excel reports from Power BI data, Meagan Longoria asked me this question on Twitter:

To which I can only reply: guilty as charged. I have always loved the Excel cube functions even though they are maybe the least appreciated, least known and least used feature in the whole Microsoft BI stack. They have their issues (including sometimes performance) but they are great for building certain types of report in Excel that can’t be built any other way.

Anyway, the recent addition of new Lambda helper functions to Excel has made me particularly happy because they can be used with cube functions to overcome some limitations that have existed since cube functions were first released in Excel 2007, and to do some other cool things too. In this series of posts I’m going to explore some of the things they make possible.

Let’s start with something simple. In Excel, the CubeSet function can be used to return an (MDX) set of items. This set is stored in a single cell, though, and to extract each item into a cell on your worksheet you need to use the CubeRankedMember function. For example, let’s say I have a table called Sales on my worksheet:

…that is then loaded into the Excel Data Model (aka Power Pivot – although this works exactly the same if I use a Power BI dataset, Azure Analysis Services or SQL Server Analysis Services as my source):

What you can then do is use the CubeSet function to create a set of all the products like so:

=CUBESET("ThisWorkbookDataModel", "[Sales].[Product].[Product].MEMBERS", "Product Set")

…and then use the CubeRankedMember function to put each individual item of the set into a cell. Here’s a simple example worksheet, first with the formulas showing and then the results:

This example shows the fundamental problem that has always existed with CubeRankedMember though: in order to show all the items in a set you need to know how many items there are in advance, and populate as many cells with CubeRankedMember formulas as there are items. In this case see how the range B4:B6 contains the numbers 1, 2 and 3; these numbers are used in the formulas in the range C4:C6 to get the first, second and third items in the set.

If a fourth product was added to the table, however, it would not appear automatically – you would have to add another cell with another CubeRankedMember formula in it manually. I’ve seen some workarounds but they’re a bit hacky and require you to know what the maximum possible number of items in a set could ever be. Indeed that’s always been one of the key differences between cube functions and PivotTables: cube functions are static whereas PivotTables can grow and shrink dynamically when the data changes.

The new MakeArray function in Excel provides a really elegant solution to this problem: you can now write a single formula that returns a dynamic array with all the items in the set in. Assuming that the same CubeSet exists in B2 as shown above, you can do the following:

=MAKEARRAY(CUBESETCOUNT($B$2), 1, LAMBDA(r,c,CUBERANKEDMEMBER("ThisWorkbookDataModel",Sheet3!$B$2,r)))

Here’s the output:

Notice how the formulas in cell B4 returns an array that contains all three items in the set into the range B4:B6.

How does this work?

  1. The CubeSetCount function is used to get the number of items in the CubeSet in B2.
  2. The MakeArray function is then used to create an array with the number of rows returned by CubeSetCount and one column
  3. In the third parameter of MakeArray the Lambda function is used to return a function that wraps CubeRankedMember, which is then called with the current row number of the array

The nice thing about this is that when more products are added to the Sales table they automatically appear in the output of the MakeArray formula in B4. So, for example, with two more products added to the Sales table like so:

Here’s the new output of the formula, showing the two new products returned in the array automatically:

This is not very useful on its own though. In my next post I’ll show you how this can be used to build a simple report.

Power BI Dataset Refresh, Column Encoding And The First Partition

If you’ve been following some of my recent posts about improving Power BI refresh performance by partitioning tables you will have seen a lot of screenshots that look like the one below:

It’s a visualisation from a report created by my colleague Phil Seamark (as detailed in this blog post) showing how long all the partitions in a dataset take to refresh. If you look at these visualisations you’ll probably ask the same question I did: why does the first partition always start before the others?

It turns out this is because when a table is refreshed, the first thing that has to happen is that a certain amount of data is read so the type of encoding (Value or Hash) used for each column is determined. In most cases tables only contain one partition so it’s not obvious that this is happening, but when a table has more than one partition this happens only for the first partition – which explains why the first partition seems to start before the others. You can’t avoid it happening but you can reduce the impact a little by using encoding hints (see here and here for more details): this process can be skipped for columns that have a Hash encoding hint, or which the engine knows in advance have to use Hash encoding, although it cannot be skipped for columns that have a Value encoding hint. What’s more the Execute SQL event for the first partition will have to complete before the Execute SQL events for all the other partitions can start.

[Thanks to Akshai Mirchandani for the information in this post]

Sample HTTP Endpoint For Querying Azure Analysis Services And Power BI Premium

I’ve always thought it would be useful if we could run queries against AAS or Power BI via a simple REST API and I’ve just come across a really nice sample project created by one of my colleagues at Microsoft, David Browne, that allows you to do just that. You can find it here:

https://github.com/microsoft/azure-analysis-services-http-sample

From the ReadMe:

This sample is a HTTP proxy for XMLA endpoints, intended for use with Power BI Premium or Azure Analysis Services.

It’s implemented as ASP.NET Core 5 API Project. The main API is /api/Query which allows you to POST a DAX query and receive the results as a JSON result.

The sample is coded to pass-through authentication from the client to the XMLA endpoint. So to call the API either use HTTP BASIC auth over HTTPS, passing credentials with the request. It’s highly advised that this be a Service Principal, rather than an AAD user. To specify a Service Principal use a UserName of the form app:[ClientID]@[TenantID], and pass a Client Secret as the Password.

For better security, instead of passing a UserName/Password using HTTP BASIC auth, fetch a Bearer token for your XMLA endpoint. To fetch a token use the Resource ID https://analysis.windows.net/powerbi/api for Power BI, or https://*.asazure.windows.net for Azure Analysis Services

I can think of all kinds of cool stuff you could do with this…

Visualising Azure Analysis Services Processing Tasks With The Job Graph Events Sample

I’ll leave it to other people to get worked up about things like Power BI Premium gen2 – what really gets me excited is a new Profiler event type. And indeed there is a new one to play with – Job Graph – that you can use to visualise Azure Analysis Services processing jobs (it’s not enabled yet for the Analysis Services engine in Power BI).

More details about it, and how it can be used, are in the samples here:

https://github.com/microsoft/Analysis-Services/tree/master/ASJobGraphEvents

The data returned by the Job Graph event isn’t intelligible if you look at the text it returns in Profiler. However if you save a .trc file with Job Graph event data to XML you can use the Python scripts in the GitHub repo to generate DGML diagrams that can be viewed in Visual Studio, plus Gantt charts embedded in HTML. Of course to do this you’ll need to have Python installed; you’ll also need to have Visual Studio and its DGML editor installed (see here for details).

Here’s the DGML diagram for the refresh of a very simple AAS database with just one table in it, called People:

The eight nodes in a semi-circle around the top of the diagram represent the eight jobs that process the attribute hierarchies in the table. These eight jobs must be completed before the single partition in the table is processed (this is the node in the centre of the diagram) and that partition must be processed before the table itself is processed (this is the node at the bottom of the diagram).

Since this is the annotated output you can hover over a node in the diagram and see more details about it, including its duration and how long it was blocked for:

Notice how three nodes have dark backgrounds? These nodes represent the critical path. From the sample documentation:

When the job graph executes, there is always a job that finishes last before the engine can commit the change. This job that finishes last is the “critical dependency” for the commit; the entire commit needs to wait for this one job to finish before executing. This last job depends on other jobs, one of which finished after all the others. This is the next critical dependency. Tracing this path of critical dependencies forms the critical path, which helps engineers and customers identify why processing takes so long.

To diagnose slow refresh times, look for the critical path and start at the top. Look at “Blocked duration”, “Waiting duration”, and “Running duration”. If a job has a long blocked duration, it spent a long time waiting on other jobs. If a job has a long waiting duration, it was waiting for an available thread, so increasing maxParallelism could help. If a job has a long running duration, then the model might have to be changed to speed up the job.

Here’s another diagram, generated from a slightly more complex database with four tables, a calculated column and a calculated table:

There’s also a script that takes this data and generates a Gantt chart from it, but at the time of writing it seems to be a bit buggy; I can make it work if I comment out some code but I suspect that means the output isn’t accurate. Hopefully it will get fixed soon, but in the meantime here’s an example of what it produces:

I think this is going to be really useful when I’m troubleshooting processing performance problems. It would also be really cool to have this in DAX Studio (hint, hint)…

Visualising Power BI Premium And Azure Analysis Services Query Parallelism

In my last post I showed how to connect SQL Server Profiler up to a Power BI Premium dataset but I didn’t give you any examples of why this might be useful. In this post I’ll show you how you can use a Profiler trace to visualise all the queries run by a Power BI report, see when they start to run, see which ones run in parallel with each other and see what the overall time taken to run all the queries is.

Why is this important? When you’re tuning the performance of a Power BI report the first thing to do is to look at the performance of the individual DAX queries run and make them run as fast as possible. However when a Power BI report is rendered any one query is likely to be run at the same time as several other queries run for the same report, and this will have an impact on its performance. How much of an impact there is will depend on how many queries need to be run and the number of back-end v-cores available on your Premium capacity, or the number of QPUs available on your Azure Analysis Services instance if you’re using a Live connection to AAS. The more v-cores/QPUs you have available, the more of the work needed for a query that can be run in parallel; you can see a table listing the number of v-cores for each Premium SKU here, and the number of QPUs for each Azure Analysis Services SKU here. As a result of this if you have reports with a large number of visuals that generate slow DAX queries, scaling up your Power BI Premium capacity or AAS instance may improve overall report performance. Reducing the number of visuals on your report and/or reducing the number of visuals needed to display the same information will also reduce the number of queries that need to be run and therefore improve overall performance.

As I showed last week, SQL Server Profiler can be used to create a trace that logs all the queries run against a Power BI Premium dataset in the same way as it can be used with Azure Analysis Services. Assuming that you have a trace running that uses only the Query End event, this will give you a list of all the queries that are being run along with their start time, end time, duration and a lot of other interesting information. A table with all this data in can still be difficult to interpret though, so I built a Power BI template for a report that visualises all these queries and helps you understand the amount of parallelism that is taking place. You can download the template file here.

To use it, first you need a trace file. Make sure that no-one else is running reports on the Premium capacity you want to test (creating a Power BI Embedded capacity for testing purposes is a good idea) and then, when the trace is running, refresh your report using the technique I described in the “Use the network tab” section of this blog post. This will also allow you to correlate what you see in the trace with the information you see in the DevTools tab in the browser.

Then save the trace file you can created to XML by going to File/Save As/Trace XML File:

SaveToXML

Next, open the Power BI template file and when prompted, enter the full path of the trace XML file you just created:

TemplateOpening

A new Power BI report will then be created. If you want to point the report to a different trace XML file all you need to do is change the value of the TraceXMLFile Power Query parameter.

On the first page you’ll see the name of the trace XML file you connected to plus a bar chart showing each Query End event (with each query identified by a number) on the y axis and the duration of each query on the x axis:

Waterfall

It’s not quite a simple bar chart though. What I’ve done is:

  • Found the start time of the first query run
  • Calculated the start time of every other query in the file relative to this first start time (although, unfortunately, Profiler only gives you start times rounded to the nearest second which means you can’t know exactly when a query starts)
  • Created a stacked bar chart where the first value in the stack is this relative start time and the second value is the duration of the query in seconds
  • Made the colour of the relative start time transparent, so you only see the blue sections of the bar for the query durations. This gives you a waterfall-like effect and allows you to see which queries are run in parallel. This also makes it easy to see the total amount of time taken to run your queries, from the start of the first query to the end of the last query, which is just as useful to know as the duration of any single query.
  • There’s also a drillthrough page so you can right-click on a bar and see a table with the DAX query for the query you clicked on, as well as its start time and duration.

It’s a very basic report, I know, and I would be interested to know if you have any ideas about other ways of visualising this data. What’s more, a visual like this raises more questions than I know how to answer… yet. For example, one thing I want to investigate is the effect that query interleaving has on this graph and both perceived and actual report performance. So stay tuned for more blog posts on this subject!

 

 

 

Excel Dynamic Arrays And The CubeSet Function

Among all the exciting announcements made at Ignite last week, one you may have missed (even in all the Excel-related announcements here) was that dynamic arrays have finally reached GA. Ever since they were announced I’ve been interested in how they can be used with Excel cube functions, which allow you to get data from Analysis Services, Power BI and the Excel Data Model/Power Pivot into cells on the worksheet, and I’ve blogged about this once already. Even though right now the CubeValue function can’t be used with dynamic arrays – alas – there are still some interesting ways dynamic arrays and cube functions can be used together and in this post I’ll show you another one I’ve found.

Say you have the following set of tables in the Excel Data Model:

image

And the following measure that counts the number of customers who bought something:

Distinct Customers:=DISTINCTCOUNT('Internet Sales'[CustomerKey])

Let’s also say you have on a worksheet the following table of data showing the number of distinct customers broken down by product, created using the CubeMember and CubeValue functions:

image

Here are the formulas for this table:

image

So far, all very straightforward. Now let’s say you want to use a dynamic array to filter the rows in this table so you only see the products that have more than a certain number of distinct customers, and you want to see a grand total for this filtered list of customers, like so:

ExcelDynamicArray

Here are the formulas to achieve this:

image

To explain what’s going on here:

  • Cell C2 contains the text “ThisWorkbookDataModel”, the name of the connection to the Excel Data Model. Doing this makes the other Excel formulas here much easier to read!
  • Cell F2 – which I’ve turned into a named range called FilterThreshold – contains the number of customers which a product must exceed to be shown.
  • Cell E5 contains the dynamic array Filter function that filters the rows shown in the first two screenshots above and only returns the products that exceed the threshold. The formula is:
    FILTER(B5:B134,C5:C134>FilterThreshold)
  • Cell E6 does something similar to E5 but returns the number of distinct customers for each product. If all I wanted was the products and the number of distinct customers I wouldn’t have needed two separate formulas, I could have used one, but I need to add the grand total too…
  • Cell E4 contains the formula:
    CUBESET($C$2, E5#, “Total”)
    This creates a set from the spill reference E5#, which contains the cells containing CubeMember formulas returned by the Filter function. For some reason CubeSet returns an error if you try to use a dynamic array formula in its second parameter but using a spill reference works ok; this is why I had to use separate formulas in E5 and E6.
  • Finally, cell F4 contains the formula that returns the total distinct count for all products returned in the filtered list using CubeValue and the set created in cell E4:
    CUBEVALUE($C$2, “[Measures].[Distinct Customers]”, E4)
    Since this is a distinct count the only way to get the correct value here is using CubeSet, CubeValue and the DAX measure – there would be no way to calculate the correct value using Excel formulas. The same could be said of almost any other DAX measure that wasn’t a simple aggregation.

You can download the sample workbook for this post here.

Power BI Report Builder And RSCustomDaxFilter

If you’re building DAX queries using Power BI Report Builder you might notice something that looks like a new DAX function called RSCustomDaxFilter. For example, here’s a simple DAX query built from the Adventure Works Tabular demo database, with one measure and one parameter built Calendar Year:

image

Note that the “Enable Multi Value Parameters” option has been selected. Here’s what you’ll see if you view the text of the DAX query in the Query Designer:

EVALUATE 
SUMMARIZECOLUMNS(
RSCustomDaxFilter(
 @DateCalendarYear,
 EqualToCondition,
 [Date].[Calendar Year],
 Int64
 ), 
"Internet Total Sales", 
[Internet Total Sales]
)

But what is RSCustomDaxFilter? If you run your report and see what happens on the server using SQL Server Profiler, here’s the query that actually gets run (in this case I selected the years 2013 and 2014 for the parameter):

EVALUATE
SUMMARIZECOLUMNS (
    FILTER (
        VALUES ( 'Date'[Calendar Year] ),
        ( 'Date'[Calendar Year] = VALUE ( "2013" ) )
            || ( 'Date'[Calendar Year] = VALUE ( "2014" ) )
    ),
    "Internet Total Sales", [Internet Total Sales]
)

What has happened is that RSCustomDaxFilter has been replaced with an expression using the DAX Filter() function that implements the filter on the selected years; it’s just a placeholder for a dynamically-generated DAX expression that is substituted in at runtime.

Why is it needed? Handling multi-value parameters is difficult in DAX when you don’t know how many values are going to be passed to the parameters (it’s a subject I’ve blogged about here and here) and some kind of dynamic code generation is a reasonable solution to this problem.

There is one drawback with this approach though – it can generate a DAX query that is too long to be executed. Here’s the error message you’ll see:

image

The specified query is too complex to be evaluated as a single statement.

I’m not sure what the maximum length of a query is in DAX – I suspect it’s 32768 characters. You’ll hit this limit if you create a parameter on a field with thousands of values in and then select all but a few of these values; from the example above you can imagine how long the resulting DAX query could be:

image

There’s no single workaround for this but some custom DAX (such as the example here) will be necessary; you’ll just need to ensure that the DAX query is as short as possible.

%d bloggers like this: