First Look At SSAS 2016 MDX On DirectQuery

Following on from my last post covering DirectQuery in Power BI, I thought it might be interesting to take a look at the way MDX queries are supported in SSAS Tabular 2016 CTP3 DirectQuery mode.

There were a lot of limitations when using DirectQuery in SSAS Tabular 2012/4, but for me the showstopper was the fact that it only worked if you were running DAX queries against your model. Historically the only major client tool that generated DAX queries to get data was Power View, and Power View was/is too limited for serious use, so that alone meant that none of my customers were interested in using DirectQuery. Although we now have Power BI Desktop and PowerBI.com, which also generate DAX queries, the fact remains that the vast majority of business users will still prefer to use Excel PivotTables as their primary client tool – and Excel PivotTables generate MDX queries. So, support for MDX queries in DirectQuery mode in SSAS 2016 means that Excel users will now be able to query a Tabular model in DirectQuery mode. This, plus the performance improvements made to the SQL generated in DirectQuery mode, means that it’s now a feature worth considering in scenarios where you have too much data for SSAS Tabular’s native in-memory engine to handle or where you need to see real-time results.

At the time of writing the most recent release of SQL Server 2016 is CTP3. If you want to test out the BI features in SQL Server 2016 CTP3 in an Azure VM, I highly recommend Dan English’s blog post here showing how to set one up. To test DirectQuery mode you need to use the older 1103 compatibility mode for your project and not the latest 1200 compatibility mode. This is documented in the release notes:
https://msdn.microsoft.com/en-us/library/dn876712.aspx#bkmk_2016_ctp3_0

image

Once you’ve created your project, you can enable DirectQuery mode in the same way as in previous versions by following the instructions here. The DirectQueryMode property on Model.bim needs to be set to On, and the QueryMode property on the project should be set to DirectQuery.

For testing purposes I downloaded the 2016 version of the Adventure Works DW database and restored it to SQL Server, then created a SSAS Tabular model containing only the DimDate table to keep things simple. I created one measure in the model with the following definition:
TestMeasure:=COUNTROWS(‘DimDate’)

First of all, I ran the following MDX query:

SELECT
{[Measures].[TestMeasure]} 
ON 0,
[DimDate].[CalendarYear].[CalendarYear].MEMBERS 
ON 1
FROM
[Model]

image

Using a Profiler trace (yes, I know I should be using XEvents but Profiler is so much more convenient for SSAS) I could see the SQL generated by SSAS in the Direct Query Begin and Direct Query End events. For the MDX query above there were three SQL queries generated. The first looks like it is getting the list of years displayed on the Rows axis:

SELECT 
TOP (1000001) [t0].[CalendarYear] AS [c15]
FROM 
(
  (SELECT [dbo].[DimDate].* FROM [dbo].[DimDate])
)
AS [t0]
GROUP BY [t0].[CalendarYear] 

The second SQL query gets the measure value requested:

SELECT 
TOP (1000001) [t0].[CalendarYear] AS [c15],
COUNT_BIG(*)
AS [a0]
FROM 
(
  (SELECT [dbo].[DimDate].* FROM [dbo].[DimDate])
)
AS [t0]
GROUP BY [t0].[CalendarYear] 

The third is simply a repeat of the first query.

However, there’s one important thing to say here: there are going to be significant changes and improvements to the SQL generated before RTM, so don’t read too much into the queries shown here.

There are several limitations in CTP3 that may or may not remain at RTM. One that you may run into is the that you can only use fully qualified MDX unique names in your queries, so

[DimDate].[CalendarYear].&[2010]

…will work but

[2010]

…will not. To be honest, I consider it a best practice to use fully qualified unique names anyway so I’m not too bothered about this. Drillthrough doesn’t work at the moment either.

MDX calculations defined in the WITH clause of a query are supported, which is really useful if you’re writing custom MDX queries for SSRS. For example the following query works and generates the same SQL (though with a few more executions) as the previous query:

WITH
MEMBER [Measures].[TestMDXCalcMeasure] AS 
SUM(NULL:[DimDate].[CalendarYear].CURRENTMEMBER,
[Measures].[TestMeasure])

SELECT
{[Measures].[TestMeasure],
[Measures].[TestMDXCalcMeasure]} 
ON 0,
[DimDate].[CalendarYear].[CalendarYear].MEMBERS 
ON 1
FROM
[Model]

image

All in all, this looks like a solid piece of work by the SSAS dev team. Go and test it! I would love to hear from anyone with genuinely large amounts of data (maybe APS/PDW users?) regarding their experiences with 2016 DirectQuery. Recently I’ve been working with a customer using SSAS Multidimensional in ROLAP mode on top of Exasol and I’ve been surprised at how well it works; I would imagine that 2016 DirectQuery and APS would be an even better combination.

One last thought. If we get the ability to query a cloud-based Power BI mode with MDX and MDX on DirectQuery is supported in Power BI too, why would you bother paying for an expensive SQL Server Enterprise/BI Edition licence plus hardware to use DirectQuery when you can get almost the same functionality in the cloud for a fraction of the price?

14 thoughts on “First Look At SSAS 2016 MDX On DirectQuery

  1. Hello Chris,

    After reading your article i only have a question left, why should i prefer directquery on top of APS more than ROLAP ? Is ther anything i can’t do with SSAS MD Rolap that i can do with SSAS Tabular Direct Query ? (No troll there, i still have some issue with the semantic layer of Tabular and i need solid argument to change 😉 )

    Thanks,

    • I have been playing around with directquery in CTP3. There a few benefits that I can see for my use.

      ROLAP generates SQL queries which select ALL columns in measure groups, whereas, directquery only selects the columns it needs. This is a big advantage if you are using columnstore indexes.

      ROLAP dimensions are not really real time. When querying them, it just processes the part of the dimension it needs to satisfy the query and caches it for later. There are large query performance penalties here, plus, you will run into problems with managing that cache when data in the underlying table(s) are changed. In the tabular directquery model all data is queried from the database so it is more suitable for real time data.

      Also, I have found that when using MDX for directquery a single SQL query is generated for each measure selected, whereas, if the equivalent query is written in DAX, a single query with all the measures are generated (where possible).

    • I have been playing around with directquery in CTP3. There a few benefits that I can see for my use.

      ROLAP generates SQL queries which select ALL columns in measure groups, whereas, directquery only selects the columns it needs. This is a big advantage if you are using columnstore indexes.

      ROLAP dimensions are not really real time. When querying them, it just processes the part of the dimension it needs to satisfy the query and caches it for later. There are large query performance penalties here, plus, you will run into problems with managing that cache when data in the underlying table(s) are changed. In the tabular directquery model all data is queried from the database so it is more suitable for real time data.

      Also, I have found that when using MDX for directquery a single SQL query is generated for each measure selected, whereas, if the equivalent query is written in DAX, a single query with all the measures are generated (where possible).

      • Thanks Richard, this is very useful information. I believe optimisation for column store indexes was one of the main design goals for DirectQuery.

        Regarding your comment about ROLAP generating queries that select all columns, I’m not sure that’s true in all cases: it will depend on the measure aggregation types. See https://blog.crossjoin.co.uk/2012/11/27/storage-engine-caching-measures-and-measure-groups/ for more details. Regarding your last comment, as I said in this post, don’t worry too much about the SQL that is currently being generated by MDX queries because there are a lot of performance improvements (which should reduce the number of queries being generated, for example) to come before RTM.

  2. Good question. My guess is that the SQL generated in DirectQuery mode will be better – but we will only be able to test properly in later CTPs when the improvements to the SQL generation I mentioned have been made. The key difference between Multidimensional ROLAP and Tabular DirectQuery is that in DirectQuery *all* query logic is translated to SQL, whereas in ROLAP only the Storage Engine operations are translated to SQL. This means that in Multidimensional ROLAP there’s less opportunity to optimise the SQL queries that are generated.

    Of course, there are still plenty of other reasons to choose Multidimensional over Tabular (eg MDX calculations) so it’s not an easy decision to make.

  3. So I have been doing some basic APS (AU4) testing with External tables pointing to Hadoop via Polybase, and then surfacing that with a tabular model in DirectQuery mode, then building an Excel Pivot table against the data model. Got all that? 🙂 Works pretty slick, access Hadoop directly without moving tons of data into the data model, performance isn’t that great (yet). But I am going to continue dow this thread of logic and see where it takes me….good stuff!

  4. So basically when i run sql profiler and run the query against my db it executes in milliseconds (indexes are in place for the queried table), but why when i browse the cube the same result is generated in like 15 – 20 seconds?

    Does the query plan generation and dax to sql conversion part takes the additional time or is it something to do with network latency?

    • To be clear, you’re running the SQL generated by SSAS and it’s fast, but when you query SSAS itself it’s slow? I wouldn’t expect network latency etc to make that much of a difference.

  5. Hi Chris, thanks for the great post. I have a question related to DAX-to-SQL translations:

    I have a DAX measure the uses the COUNTROWS() function as follows:
    COUNTROWS(SUMMARIZE(CALCULATETABLE(…)))

    When this gets converted to SQL in DirectQuery mode, the outer-most SQL generates a statement like this:
    ( COUNT_BIG(DISTINCT [t13].[CCN]) + MAX(CASE WHEN [t13].[CCN] IS NULL THEN 1 ELSE 0 END) ) AS [a0]

    This is hitting a standard star-schema data warehouse structure that uses clustered columnstore indexes on the fact tables (and we’re on SS2016, CU2 applied). Any ideas on why the MAX() statement gets added in the DAX-to-SQL translation, and if there’s a way to modify the tabular model or the DAX expression so that it doesn’t get added? It’s unnecessary for my purposes and ends up producing a bad query plan. Removing the MAX() expression from the SQL code improves the query performance by 95%.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s