Chris Webb's BI Blog: Aggregations

Back in December I wrote a post explaining why you may encounter the “The resultset of a query to external data source has exceeded the maximum allowed size” error when working with degenerate dimensions in DirectQuery mode in Power BI. In that post I showed an example of how a moderately complex DAX expression in a measure can trigger the error even when you’re applying a filter in a visual; in this post I’ll show how you can use aggregations to avoid that error.

A quick recap: say you have a dataset built from the ContosoRetailDW sample database with a Date dimension table and a fact table called FactOnlineSales with more than 12 million rows in it.

There are two measures:

Sales Amount = SUM(FactOnlineSales[SalesAmount] )

Error Demo = 
var s = [Sales Amount]
return if(s>0, s)

If you build a table visual with the SalesOrderNumber column (a degenerate dimension from the fact table with the same granularity as the fact table) on rows and just the [Sales Amount] measure in, and you filter to a single date, then everything works ok:

…but if you add the [Error Demo] measure too a SQL query is generated to get all the values from the SalesOrderNumber column which returns more than a million rows and triggers the error:

For more details please see the original post.

Since the error is caused by a DirectQuery query to get all the values from the SalesOrderNumber column, one workaround is to build an Import mode aggregation table that contains all the distinct values from that column.

It was easy to do this in my case just using Power Query – I just duplicated the FactOnlineSales query, removed all columns apart from SalesOrderNumber, and did a “Remove Duplicates” transform. This left me with a new table containing just one column, SalesOrderNumber, that I could load into my dataset using Import mode:

I then configured this new table as an aggregation table for the FactOnlineSales table, with the SalesOrderNumber column from the new table set as a GroupBy on the SalesOrderNumber column from FactOnlineSales:

With this aggregation table configured, the table with both measures in no longer gives an error:

A Profiler trace shows that the query to get the data needed for the table is still in DirectQuery mode, but the “dimension query” to get all the distinct values from SalesOrderNumber now hits the Import mode aggregation:

I know what you’re going to say though: “If I’m using DirectQuery for my fact table I don’t want to build Import mode aggregations!”. Well yes, there are some limitations to point out with this approach. In my opinion it will work well if you are using DirectQuery mode because you have very large fact tables – even a few billion rows – but your data doesn’t change very frequently (say only once a day). In that scenario refreshing an aggregation table containing just a single column could be very fast and take up a relatively small amount of memory in Power BI, at least in comparison with an Import mode table containing all the columns from the fact table. Using incremental refresh on the aggregation table will also help but unfortunately you can’t use a hybrid table as an aggregation table at the time of writing this post, so you can’t mix Import mode and DirectQuery for the aggregation table. On the other hand if you’re using DirectQuery because your data changes frequently during the day then I don’t think this approach will work because it will be impossible to keep the contents of your Import mode aggregation table in sync with the contents of your DirectQuery fact table.

The new DAX window functions (announced here, more details on Jeffrey Wang’s blog here and here) have generated a lot of excitement already – they are extremely powerful. However one important benefit of using them has not been mentioned so far: they can give you much better performance in DirectQuery mode because they make it more likely that aggregations are used. After all, the fastest DirectQuery datasets are the ones that can use aggregations (ideally Import mode aggregations) as much as possible.

To illustrate this, here’s a very simple dataset with a fact table in DirectQuery mode and a Date dimension table in Dual mode built on the SQL Server AdventureWorksDW2017 sample database:

Let’s start off with a simple measure that sums up the values in the SalesAmount column:

Sales Amount = SUM('Internet Sales'[SalesAmount])

When you use it in a table visual with the CalendarYear column from the Date table like so:

…Power BI can get the data it needs with a single SQL query. I won’t show the whole query here, but it’s a simple Group By and returns exactly what you’d expect if you run it in SQL Server Management Studio:

Now let’s say you want to do a year-on-year growth calculation. To do this, you’ll need to be able to find the Sales Amount for the previous year. One way to do this in DAX would be to use the SamePeriodsLastYear function like so:

LY Sales Amount V1 =
CALCULATE (
    [Sales Amount],
    SAMEPERIODLASTYEAR ( 'Date'[FullDateAlternateKey] )
)

Used in a table it gives the correct result:

However the SQL query generated for this visual is now a lot more, ahem, verbose and because the DAX time intelligence functions are all resolved at the date granularity it now returns Sales Amount summed up by date rather than by year:

If you write the same previous year sales measure using the new Offset function like so:

LY Sales Amount V2 =
CALCULATE (
    [Sales Amount],
    OFFSET (
        -1,
        ALLSELECTED ( 'Date'[CalendarYear] ),
        ORDERBY ( 'Date'[CalendarYear], ASC )
    )
)

…you get the same result as before:

…but now the SQL query is much simpler and returns data at the year granularity, as you’d want:

I’m not a SQL expert so I won’t comment on the SQL generated – besides, it may change in the future – but the most important implication of this is that the version of the measure that uses Offset is more likely to be able to use aggregations.

For example, if you add an extra import-mode table to the dataset with the data from the fact table aggregated to year granularity:

…and set it up as an aggregation table:

…then any queries at the year granularity should use it. As you would expect, the query for visual shown above with the measure using SamePeriodLastYear misses the aggregation and goes to the DirectQuery fact table:

The query for the measure using Offset, though, can use the aggregation and there is no DirectQuery activity at all:

This is just one example. I’m almost certain there are other ways to write this calculation without using Offset which will also hit the aggregation but they won’t be as elegant as the Offset version. What’s more, as your measures get more and more complex it gets harder and harder to write DAX that results in simple, efficient SQL in DirectQuery mode and the new window functions are a massive help here.

Category: Aggregations

Avoiding The “Maximum Allowed Size” Error In Power BI DirectQuery Mode With Aggregations On Degenerate Dimensions

Like this:

Why DAX Window Functions Are Important For Performance In Power BI DirectQuery Mode

Like this:

Share this:

Like this:

Share this:

Like this: