Back in December I wrote a post explaining why you may encounter the “The resultset of a query to external data source has exceeded the maximum allowed size” error when working with degenerate dimensions in DirectQuery mode in Power BI. In that post I showed an example of how a moderately complex DAX expression in a measure can trigger the error even when you’re applying a filter in a visual; in this post I’ll show how you can use aggregations to avoid that error.
A quick recap: say you have a dataset built from the ContosoRetailDW sample database with a Date dimension table and a fact table called FactOnlineSales with more than 12 million rows in it.
There are two measures:
Sales Amount = SUM(FactOnlineSales[SalesAmount] ) Error Demo = var s = [Sales Amount] return if(s>0, s)
If you build a table visual with the SalesOrderNumber column (a degenerate dimension from the fact table with the same granularity as the fact table) on rows and just the [Sales Amount] measure in, and you filter to a single date, then everything works ok:
…but if you add the [Error Demo] measure too a SQL query is generated to get all the values from the SalesOrderNumber column which returns more than a million rows and triggers the error:
For more details please see the original post.
Since the error is caused by a DirectQuery query to get all the values from the SalesOrderNumber column, one workaround is to build an Import mode aggregation table that contains all the distinct values from that column.
It was easy to do this in my case just using Power Query – I just duplicated the FactOnlineSales query, removed all columns apart from SalesOrderNumber, and did a “Remove Duplicates” transform. This left me with a new table containing just one column, SalesOrderNumber, that I could load into my dataset using Import mode:
I then configured this new table as an aggregation table for the FactOnlineSales table, with the SalesOrderNumber column from the new table set as a GroupBy on the SalesOrderNumber column from FactOnlineSales:
With this aggregation table configured, the table with both measures in no longer gives an error:
A Profiler trace shows that the query to get the data needed for the table is still in DirectQuery mode, but the “dimension query” to get all the distinct values from SalesOrderNumber now hits the Import mode aggregation:
I know what you’re going to say though: “If I’m using DirectQuery for my fact table I don’t want to build Import mode aggregations!”. Well yes, there are some limitations to point out with this approach. In my opinion it will work well if you are using DirectQuery mode because you have very large fact tables – even a few billion rows – but your data doesn’t change very frequently (say only once a day). In that scenario refreshing an aggregation table containing just a single column could be very fast and take up a relatively small amount of memory in Power BI, at least in comparison with an Import mode table containing all the columns from the fact table. Using incremental refresh on the aggregation table will also help but unfortunately you can’t use a hybrid table as an aggregation table at the time of writing this post, so you can’t mix Import mode and DirectQuery for the aggregation table. On the other hand if you’re using DirectQuery because your data changes frequently during the day then I don’t think this approach will work because it will be impossible to keep the contents of your Import mode aggregation table in sync with the contents of your DirectQuery fact table.