Jon Axon mailed me recently with an interesting bug he’d come across concerning many-to-many dimensions, where it looks like the AS engine is trying to be a bit too clever for its own good as far as query optimisation goes. It’s reproducible on AS2005 SP2 and Katmai RC0, and I logged it on Connect here: https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=357469
Here’s an example on Adventure Works that Jon gave me. The following query:
SELECT
[Measures].[Customer Count] on 0
FROM [Adventure Works]
Shows the total number of customers as being 18484. Looking at the Sales Reason dimension (which has a many-to-many relationship with the measure group Customer Count is from), if you run a relational query on the underlying data source you can see that not every order was associated with a sales reason – this is the key point here. Now if we run the query:
SELECT
[Measures].[Customer Count] on 0
FROM [Adventure Works]
WHERE [Sales Reason].[Sales Reason].[All Sales Reasons]
We can see that it returns the same result as the first query, 18484, as you would expect. But what if we want to find the number of customers who specified a sales reason? You would think that the following query would do that:
SELECT
[Measures].[Customer Count] on 0
FROM [Adventure Works]
WHERE [Sales Reason].[Sales Reason].[All Sales Reasons].Children
But it doesn’t, it returns 18,484 again. However if you run this query which should be equivalent to the previous query:
SELECT
[Measures].[Customer Count] on 0
FROM [Adventure Works]
WHERE {[Sales Reason].[Sales Reason].&[1],
[Sales Reason].[Sales Reason].&[2],
[Sales Reason].[Sales Reason].&[3],
[Sales Reason].[Sales Reason].&[4],
[Sales Reason].[Sales Reason].&[5],
[Sales Reason].[Sales Reason].&[6],
[Sales Reason].[Sales Reason].&[7],
[Sales Reason].[Sales Reason].&[8],
[Sales Reason].[Sales Reason].&[9],
[Sales Reason].[Sales Reason].&[10]}
It returns the value we’re looking for, 17022. Not good! What I suspect is happening here is that when the query optimiser sees the expression [Sales Reason].[Sales Reason].[All Sales Reasons].Children in the Where clause, it assumes that it’s equivalent to [Sales Reason].[Sales Reason].[All Sales Reasons] which it is in most cases, just not here. Interestingly if you look in Profiler and run the queries on a cold cache, you see exactly the same activity when [Sales Reason].[Sales Reason].[All Sales Reasons].Children is in the Where clause as when [Sales Reason].[Sales Reason].[All Sales Reasons] is there; but when you explicitly list all of the individual sales reasons, as in the last query, because Sales Reason has to resolve itself through the Internet Sales Order Details dimension which is ROLAP, you see SQL queries being fired off against the relational database.
Some years ago (!) I observed part of this behavior (the one using the all member or the children property) and (guess it) it\’s "by design" 🙂
The bridge table is considered in the query only when the related m2m dimension is involved in the query specifying a subset of its members. As you said, the observed behavior is a result of the combined action of query optimizer and m2m resolution.
I think that a good solution would be a different one than simply changing the behavior of the query optimizer. I would add a property to measures (or measure groups) to specify if the evaluation of that specific measure/measure group has to always consider the related m2m dimension and its intermediate measoure group or if this passage can be optimized. This would solve the issue I described in my m2m paper (http://www.sqlbi.eu/manytomany.aspx) related to the distinct count alternative measure.
Chris, did you hear about m2m issues related to empty bridge table. Looks like a bug to me. I posted it here:http://social.msdn.microsoft.com/Forums/en-US/sqlanalysisservices/thread/e2b9ed52-dd8d-4698-9a61-adda07909e41Any thoughts are welcome.Thank you,Igor