measures.[internet sales amount] on 0,
crossjoin(
[Date].[Day Name].[Day Name].members,
[Date].[Day of Week].[Day of Week].members,
[Product].[Color].[Color].members
)
on 1
from [Adventure Works]
Microsoft Fabric, Power BI, Analysis Services, DAX, M, MDX, Power Query, Power Pivot and Excel
A lot of my work involves tuning MDX queries, and over the last few months I’ve come across several situations where calculation-heavy queries take a long time to execute but for the most part only seem to use one CPU on a multi-processor server. When I first saw this I made some enquiries about what was going on and found that this was indeed a feature of the AS calculation engine: only one query can be run per session at any one time, and for that query the work of the formula engine (unlike the storage engine) can’t be parallelised. Hohum.
However, then I realised that there are some scenarios where you can inject parallelism into a query. Consider the following query in Adventure Works:
select [Measures].[Amount] on 0,
bottomcount(
{[Scenario].[Scenario].&[1], [Scenario].[Scenario].&[2]}
*
[Account].[Account].[Account].members
*
[Date].[Date].[Date].members
*
[Department].[Departments].[Department Level 02].members
, 10,[Measures].[Amount])
on 1
from
[Adventure Works]
On my laptop, on a warm cache, it consistently executes in just under 40 seconds; my laptop is dual core and for the vast majority of the query time I see just 50% CPU usage. This is consistent with what I found out about the formula engine since the Account dimension has unary operators . However, if I open two different connections in SQL MS and run both of the following queries simultaneously:
select [Measures].[Amount] on 0,
bottomcount(
{[Scenario].[Scenario].&[1]}
*
[Account].[Account].[Account].members
*
[Date].[Date].[Date].members
*
[Department].[Departments].[Department Level 02].members
, 10,[Measures].[Amount])
on 1
from
[Adventure Works]
select [Measures].[Amount] on 0,
bottomcount(
{[Scenario].[Scenario].&[2]}
*
[Account].[Account].[Account].members
*
[Date].[Date].[Date].members
*
[Department].[Departments].[Department Level 02].members
, 10,[Measures].[Amount])
on 1
from
[Adventure Works]
…both queries finish in 20 seconds and I see 100% CPU usage.
What I then did was create a stored procedure with a function called ParallelUnion which, when passed to MDX set operations like the Bottomcounts in the queries above, opened two separate connections, executed the operations in parallel then Unioned the results and returned a set. Here’s an example of how it worked:
select measures.amount
on 0,
bottomcount(
ASSP.ParallelUnion(
"bottomcount(
{[Scenario].[Scenario].&[1]}
*
[Account].[Account].[Account].members
*
[Date].[Date].[Date].members
*
[Department].[Departments].[Department Level 02].members
, 10,[Measures].[Amount])"
,
"bottomcount(
{[Scenario].[Scenario].&[2]}
*
[Account].[Account].[Account].members
*
[Date].[Date].[Date].members
*
[Department].[Departments].[Department Level 02].members
, 10,[Measures].[Amount])"
), 10, [Measures].[Amount])
on 1
from
[Adventure Works]
To be clear, in this query the sproc executes the two Bottomcount set expressions passed in as strings on two different connections and sessions, and returns a unioned set containing the bottom 10 tuples for both [Scenario].[Scenario].&[1] and [Scenario].[Scenario].&[2], 20 tuples in all. Given that the overall bottom 10 tuples must be present in the 20 tuples the sproc returns I still need to do another Bottomcount but it’s relatively inexpensive. The query above executed in 17 seconds, less than half the time of the original query, and shows 100% CPU usage on my laptop.
I also created a ParallelGenerate function which worked like the existing Generate function in MDX and which would allow for more flexibility in splitting up queries in order to make use of more than two processors. Here’s an example:
select measures.amount
on 0,
bottomcount(
ASSP.ParallelGenerate(
{[Scenario].[Scenario].&[1],[Scenario].[Scenario].&[2] },
"bottomcount(
{[Scenario].[Scenario].currentmember}
*
[Account].[Account].[Account].members
*
[Date].[Date].[Date].members
*
[Department].[Departments].[Department Level 02].members
, 10,[Measures].[Amount])"
), 10, [Measures].[Amount])
on 1
from
[Adventure Works]
My next problem was that although I’d proved what was possible, my code wasn’t exactly robust. Luckily at this point the guys from the Analysis Services Stored Procedure Project helped me out: both Jon Axon and Mosha provided some excellent feedback and Greg Galloway took the time to rewrite it all. I’m extremely grateful to all three for their help, esppecially Greg for all the effort he put in. You can download the code (alond with lots of other useful sprocs) as part of the Analysis Services Stored Procedure Project here:
http://www.codeplex.com/Wiki/View.aspx?ProjectName=ASStoredProcedures
(Note you need to download the latest source code and compile it; the last compiled dll release doesn’t contain these functions)
Now to mention the drawbacks to all this (thanks again are due to Mosha for pointing several of these out):
That said, I still think these functions can make a massive difference to query performance in some circumstances. I’d be very interested to hear from anyone who does use them in their queries to know what the impact is.
I thought I’d avoided this meme, but since Darren Gosbell tagged me and since Mosha (who was also tagged) has already joined in, here are my five things: