How To Optimise The Performance Of MDX Queries That Return Thousands Of Rows

One problem I encounter on a regular basis is how to optimise the performance of MDX queries that return thousands, hundreds of thousands, or even millions of rows. The advice I give is always the same:

Reduce the number of rows that your query returns!

Yes, there are some things you can change in your queries and cube design to improve performance, but these are the same things I’d suggest for any query (build aggregations, rewrite MDX, partition etc etc). In my opinion, if you have a query that returns a ridiculously large number of rows you are doing something fundamentally wrong.

There are three reasons why SSAS developers write this kind of query:

  1. They are doing a data-dump from SSAS to another system. Mostly the developer doesn’t realise this though, since the other system is Excel-based and the end user has disguised their requirement as a report. In most cases, user education about how to use Excel with SSAS results in an approach that doesn’t require dumping thousands of rows of data to an Excel worksheet.I will admit that I have seen a few cases where developers need to dump data out of SSAS for other purposes, and have no option but to use SSAS because they have to add complex calculations that can only feasibly be implemented in MDX. These are very rare though, and most of the time using SQL queries against the underlying relational database works a lot better.
  2. The end users have specified a report that returns lots of data, because that’s just what they want, dammit! Often this is to recreate a report built in a previous system that, at some point in the 1970s, was printed out into a gigantic book every month. My argument here is that a report should return no more data than can be seen on a screen without scrolling. If you need to scroll in a report, you probably should be giving the end user more parameters to filter that report so they can find the data they want to see more easily instead.Of course it’s one thing to know what you should be doing, it’s another thing entirely to tell the CFO that their requirements are stupid. If you can’t convince your end users that you know better than them, you have my sympathy. Usually I find that having to choose between the poor performance of what they want and the better performance of a different approach helps them come to their senses.
  3. Finally, the way that SSRS handles drilling down in reports often leads report developers to bring back vast amounts of data. The advice to increase the number of parameters for filtering is equally relevant here, but you can also use MDX techniques like this one to implement drill down in a much more efficient way.

At the end of the day, SSAS just isn’t optimised for returning large resultsets – it was designed to return PivotTable-style queries, which are always relatively small. You can get good performance for large resultsets if you know what you’re doing, you have the time, and you’re lucky, but you’ll usually be better off rethinking your requirements or choosing a different tool.

13 thoughts on “How To Optimise The Performance Of MDX Queries That Return Thousands Of Rows

  1. In my case the users think that an OLAP cube is (finally) the access to the raw data. so they try to dump the data out from the prod system through the cube. (else they cant access the data quickly, they have to rely on CR taking weeks of delay)
    they open the cube in Excel, dump near all the dimensions in excel (day level, products etc…) resulting in huge resultsets.

    the problem here, the IT department dont understand the needs and dont want to provide a selfservice access point (like an OData web site) and they want to force the users to go through the cube.
    the users are conducting advanced analysis (like complex basket analysis) which cant be answered by a cube. and IT dont want to create small dedicated cubes… 🙂
    so an infinite loop of non communication…
    after millions dollars invest in the cube (yes millions), finally the power users are using the old way with CR to get the raw data!!!! (but the project is a success… crazy, no?) 🙂

  2. I totally agree on your solution 🙂 – lets hope the Power BI Desktop visualizations can help us to develop better and less detailed reports and we can move away from hopelessly long SSRS reports that are exported to PDF and/or exported to Excel.

  3. While I face the same problems you mention (users that wants to download tons of data), I must say that problems does not come from MDX or the Cube itself, but from the queries that Excel produces when you use PivotTable. Rewriting the same queries with a better MDX (that, for example, avoid to extract subtotals when you don’t want them) allows to have very good performance…it’s quite common to extract up to 600.000 rows (don’t ask why the end user “needs” it…) in couple of seconds. Of course we had to write our own Excel plugin in order to rewrite the generated MDX on the fly….

    • Hi Davide, I know about the problems with Excel subtotals and the MDX it generates, and I have blogged about the workarounds several times. For fairly simple queries you’re right, you can get good performance, but it’s not always the case. The point I wanted to make here was that it’s the requirement itself that needs to be questioned.

      • Nope we created it for one of our client. Basically we get all the objects used in the OLAP Pivotable and then we create a new “standard” pivottable on-the fly writing the MDX on our own, and this optimizing it *a lot*.

  4. Thanks Chris.

    I see this issue on occasion too. I always describe SSAS as a high performance AGGREGATED query engine. It’s a poor non-aggregated query engine. For non-aggregated queries, I recommend querying off the underlying relational data warehouse.

    Cheers – Adam .

      • Hi Imke

        In principle, I try to include the lowest grain (i.e. detail level) of fact tables in the cube. This is ‘Kimball’ best practice. It’s difficult to try to guess what grain users may want. Rolling up to a higher grain may cause you issues, when you try to associate the fact with dimensions, where the lowest grain of the dimension is the detail level. What key do you associate on? I think Chris wrote a post on this recently. There are some exceptions of course. One example may be where you have some detector transactions which capture data each second. In your Cube, if the lowest grain of your time dimension was minute or hour, then you might want to aggregate that detector data to the minute/hour when loading it into the cube.

        Hope that helps – Adam.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s