Last week I was doing some performance tuning for a customer using Power BI on top of a large SSAS Multidimensional cube via a Live connection. Some of their reports were performing particularly badly and I uncovered a problem with the DAX generated by Power BI for tables with a large number of rows, fields from two or more large hierarchies and totals turned on.
The problem is very easy to reproduce; I’m going to do it using a simplified version of the Adventure Works cube that contains only the Date and Product dimensions. Take a look at the following table from a Power BI report:
It has the Date attribute from the Date dimension, and the Product attribute from the Product dimension, on rows and two measures on columns. The table has a large number of rows in it (both Date and Product are fairly large hierarchies) and if you look at the DAX generated by Power BI you can see that it only requests the first 501 rows. That’s fine – the real problem is that Power BI also generates a second DAX query to get the two values displayed in the Total line at the bottom of the table. The DAX looks something like this, and is easily identifiable because it uses the Row() function:
[sourcecode language="text" padlinenumbers="true"] EVALUATE CALCULATETABLE( ROW( "Sales_Amount", 'Internet Sales Facts'[Sales Amount], "Tax_Amt", 'Internet Sales Facts'[Tax Amt] ), KEEPFILTERS( GENERATE( KEEPFILTERS(VALUES('Product'[Product.Key0])), CALCULATETABLE( FILTER( KEEPFILTERS(VALUES('Order Date'[Date.Key0])), OR( NOT(ISBLANK('Internet Sales Facts'[Sales Amount])), NOT(ISBLANK('Internet Sales Facts'[Tax Amt])) ) ) ) ) ) ) [/sourcecode]
This query has something in it – I don’t know what – that means that it cannot make use of the Analysis Services Storage Engine cache. Every time you run it SSAS will go to disk, read the data that it needs and then aggregate it, which means you’ll get cold-cache performance all the time. On a big cube this can be a big problem. This is very similar to problems I’ve seen with MDX queries on Multidimensional and which I blogged about here; it’s the first time I’ve seen this happen with a DAX query though. I suspect a lot of people using Power BI on SSAS Multidimensional will have this problem without realising it.
This problem does not occur for all tables – as far as I can see it only happens with tables that have a large number of rows and two or more hierarchies in. The easy way to check whether you have this problem is to refresh your report, run a Profiler trace that includes the Progress Report Begin/End and Query Subcube Verbose events (and any others you find useful) and then refresh the report again by pressing the Refresh button in Power BI Desktop without changing it at all. In your trace, if you see any of the Progress Report events appear when that second refresh happens, as well as Query Subcube Verbose events with an Event Subclass of Non-cache data, then you know that the Storage Engine cache is not being used.
Also look at the Duration column in the trace for these events which shows the time in milliseconds that they took to execute. This will tell you how much of an impact this problem is having on your report refresh times.
The easy way to stop this happening is to turn off the totals row in the table:
Displaying the totals in a separate table also seems to avoid the problem, although of course it doesn’t look as good. Only putting one hierarchy in the table apart from your measures, also seems to solve the problem. You could also try all the usual methods to improve SSAS performance such as building aggregations on the cube.
The dev team is aware of this problem but it’s unlikely to be fixed in the short term.
UPDATE 8th September 2020: This problem has finally been fixed as part of the general SuperDAX performance improvements in Analysis Services 2019 CU5. See https://powerbi.microsoft.com/en-us/blog/improving-power-bi-performance-when-querying-multidimensional-models/ for more details.