I’ve been talking about measure expressions (and Mosha has been plugging the fact you can do the same thing in MDX Scripts) for a while now, but up until now I’ve not got round to running any tests which compare their relative performance. However, I’ve been doing a PoC over the last week or so which has given me just this opportunity and I thought I’d share my results…
The scenario I had to implement was as follows: I had a sales fact table which showed the number of Units for a particular SKU of a Product sold per month (plus various other dimensions), and I had a prices fact table which showed the Price of each SKU per Month, and I wanted to be able to analyse sales values. Fairly straightforward so far, I just created a view which joined the two fact tables on SKU and Month and then multiplied Units by Price to get the Value. However, I also had a requirement for the user to be able to pick any given Month and to be able to look at Values for a range of Months to be calculated as if the Price for each SKU was frozen at the Month they had picked – this would allow them to remove the effect of price changes from any changes in sales Value. You can probably see that you might want to do something similar when using exchange rates.
Anyway, I originally tried to use measure expressions to implement this ‘fixed price’ requirement but after a lot of thought (and discussion with Jon) came to the conclusion that it wasn’t actually possible. I had one physical Time dimension but this was present in the cube twice as two role-playing dimensions, one linked to the Units measure group and one to the Price measure group, and the fact that these two dimensions weren’t common to both measure groups made the numbers aggregate up incorrectly. Even when I tried linking the Units Time dimension to the Prices measure group and the Prices Time dimension to the Units measure group using many-to-many dimensions, which I thought should work in theory, I still didn’t get the correct numbers. It obviously needs a regular relationship in order to work properly. So I went ahead and used an MDX Script calculation instead and was pleasantly surprised with the results.
In order to properly compare the performance between ordinary measures, measures which have measure expressions and measures whose values are calculated in the MDX Script I created a test cube based on a subset of my data. I took six months of sales data which resulted in a fact table of approx 7 million rows, the whole of my prices fact table (approx 6 million rows), and the whole of my Product dimension (170000 SKUs) and built a cube with no aggregations or partitions and four measures: Units, Value (calculated within the fact table and a regular measure), Value calculated using a measure expression of the form [Measures].[Units]*[Measures].[Price], and a regular measure exactly the same as Units but whose value was overwritten in the MDX Script with a calculation (based on Mosha’s recommendations here) similar to:
CALCULATE
;([Measures].[VALUE – SCRIPT], LEAVES([Product]), LEAVES([Time]))
= [Measures].[VALUE – SCRIPT] * [Measures].[PRICE];
I checked that all three Value measures returned the same result and then ran a very big query (26000 rows and 4 columns) to see how they compared, stopping and starting the Analysis Services service before each query to try to reduce the effect of caching. On my fairly ordinary 2-way Windows 2003 box running the June CTP this mega-query ran for 13 seconds for both the ordinary Value measure and for the measure expression version, which I thought was pretty amazing. The MDX Script version ran for 18 seconds which I thought was still very good. Subsequent runs of each query on a warm cache all ran in 9 seconds. Obviously the next step would be to install AS2K on the same box and see how performance was on the same cube when using a calculated member to do this calculation, but I unfortunately I haven’t got the time to do this and in any case I can’t imagine it would perform anywhere near as well.
So, what can we take away from this? It looks like the claims for big performance improvements in the AS2005 calculation engine might be justified. Where you can use measure expressions – and in most classic currency exchange rate scenarios you will be able to use them, even though I wasn’t able to – they seem to perform almost as well as if the value was precalculated in the fact table. Of course I didn’t test what impact writeback has on query performance, and this is the other essential factor in most financial applications, but in AS2K you always had to design you cube to account for the possibility that values might have been written back to the cube even if they hadn’t been, and even then performance could be bad. And even if you can’t use measure expressions, it looks like a pure MDX Scripts approach can still perform really well.
Hi Chris,Long time no speak! I\’ve been immersed in a project in London and haven’t even had the time to blog. I’m getting ready to go to Dallas at the weekend for PASS USA where I will do the same presentation as I did at PASS Europe. Hopefully Hurricane Rita won’t result in mob rule at the convention! Only kidding – it is forecast to hit southern Texas, not Dallas.We have another project where a customer is evaluating AS2005. We had an MDX query with lots of calcs that was taking 40 seconds on CTP 15. Having installed CTP 16 (the September one), it is taking 2 seconds. A 20 fold improvement in performance! It really is in a different league now compared to AS 2000.However, we were saddened to hear that the CACHE statement has been ditched. This customer has some really hard-core MDX calculations with massive volumes, which will take a while even with the latest performance boost. They were prepared to take the hit at ETL time and pre-calculate using the CACHE statement. Here’s hoping it will be re-introduced in a service pack!Christian
Sounds good – I forgot to mention that the only reason I was using the June CTP rather than the September CTP was because I was also using some beta software from a third party which only worked with the June CTP.I heard about the loss of CACHE() too, but I strongly suspect that it will reappear in a service pack. I think it was just one of those features they couldn\’t get working properly in time for the release date.Have fun in Dallas!
Hi Chris,Interesting article 🙂 Still I think I have to reread this a few times, next to the part of Mosha and the Last Updated Price piece in the MDX Solution book. Something bothers me with getting that latest price in the time leaf level over and over (because that will be time consuming). Now, the way I would try to work in AS2K is 1 I would make a small (price per product) cube. Small, because prices don\’t change that often. In this "rate" cube I would overwrite the month level and the product category level aggregations to null by making use of calculated cells. (If a rate on a product has defined twice in a month, the sum isn\’t the answer. And the sum of 2 products rates in a category isn\’t the rate of the product category).2 In a Sales cube, I would place the soldunits fact measure only. 3 Then, in the Sales cube I would define 2 calculated members named latestunitprice and sales. I would give them the the value NULL.4 I would make use of a calculated cell calculation in those calculated members, for a subcube with the product leaf level and the time members I would like people to query (…not every day). Then, I would use lookupcube formula on the small Rates cube.5 The lookupcube formula on Rates has to be a smart one. Not like: search day per day for the latest price, but more something like: – search if the price changed the year before yes? go to latest month changed no? previous month yes? and then the days -Don\’t you think this approach would speed up the process?Interested in your point of view, kind regards JP
Hi JP,Are you suggesting this approach for AS2K or AS2005? As an AS2K approach my thoughts are:* The question of whether to have a small price cube, which only has entries when the price of a product changes, or whether to have a large price cube which contains a price for every product in every month regardless of whether it has changed or not, is an interesting one. I\’d probably want to test both approaches but my feeling is that it might just be quicker to use a larger cube. The big advantage would be that you wouldn\’t need to evaluate an MDX expression to find the last available price – it would just be a straight query to the cube. In my scenario where you are potentially requesting thousands or millions of prices per query, I suspect that querying a large cube with no need for a calculated member would be faster than querying a small cube with a calculated member.* The mixture of calculated members and calculated cells you describe to show the lastest price would work, but I\’m always a bit suspicious of using calculated cells in AS2K. You could achieve the same result using calculated members alone, the code would be much clearer and easier to maintain, and given the questionable performance of calculated cells in some circumstances I think it might be better from a performance point of view.* Are you talking about using the LookUpCube function? If so, it\’s usually a good idea to avoid using it because it can cause big query performance problems; virtual cubes offer much better performance when implementing cross-cube calculations.