What’s New In Analysis Services 2022?

There was a time when a new release of SQL Server – and therefore a new release of SQL Server Analysis Services – was the most exciting thing in the world for me. New functionality! New things to blog about! Not so now that my focus, and Microsoft’s, is on Power BI and we get cool new functionality there every month. All the same there are still a lot of people running SSAS on-premises and SQL Server 2022 has just been released, so what’s new and is it worth upgrading?

There’s nothing about Analysis Services in the SQL Server 2022 GA announcement blog post, but you can find a list of what’s new here:

https://learn.microsoft.com/en-us/analysis-services/what-s-new-in-sql-server-analysis-services?view=asallproducts-allversions

Most of the items listed here are performance optimisations, most of which have been available in Power BI and Azure Analysis Services for some time now (although we haven’t got parallel execution plans for DirectQuery in Power BI just yet šŸ˜‰). Probably the most important in my opinion is MDX Fusion, the main effect of which is to improve the performance of Excel PivotTables and cube-function-based reports connected to SSAS Tabular – I saw some cases where MDX queries ran a lot faster when this rolled out for Power BI. All the features are applicable to SSAS Tabular although some are applicable to SSAS Multidimensional too; there are also a few other minor optimisations that aren’t listed. The new cloud-billing model announced here is only applicable to the SQL core engine and not to SSAS, SSRS or SSIS.

There are no deprecated features but Multidimensional’s data mining features and PowerPivot for SharePoint are now officially discontinued (which means that they are now no longer supported – see the definition of “discontinued” here).

As a Microsoft employee, obviously I’m going to say you should upgrade to SQL Server 2022. As a member of the Power BI product group I would add that you should also consider migrating all your on-prem SSAS Tabular models to Power BI Premium if you can: Power BI Premium is the strategic direction for enterprise BI as well as self-service BI and that’s where all the investment is going from now on. Don’t think about migrating to Azure Analysis Services instead – we’re already encouraging people to migrate from AAS to Premium! My colleague Dan English just posted a great walkthrough of the new AAS to Premium migration experience here, which is worth checking out.

Migration from SSAS Multidimensional to Power BI is a much more difficult task. You’ll need to rebuild your existing cubes and calculations from scratch manually in Power BI (there are no tools to automate migration because it isn’t possible to build them). Simple cubes should be easy to rebuild; more complex cubes, for example those with parent/child hierarchies, custom rollups or SCOPE statements for example, will be much more difficult to migrate and you may need to accept that you can’t reproduce some functionality exactly. You can always run SSAS Multidimensional in a virtual machine in Azure if you need to move to the cloud and there are VM images to make that easy.

Power BI Dataset Refresh, Column Encoding And The First Partition

If you’ve been following some of my recent posts about improving Power BI refresh performance by partitioning tables you will have seen a lot of screenshots that look like the one below:

It’s a visualisation from a report created by my colleague Phil Seamark (as detailed in this blog post) showing how long all the partitions in a dataset take to refresh. If you look at these visualisations you’ll probably ask the same question I did: why does the first partition always start before the others?

It turns out this is because when a table is refreshed, the first thing that has to happen is that a certain amount of data is read so the type of encoding (Value or Hash) used for each column is determined. In most cases tables only contain one partition so it’s not obvious that this is happening, but when a table has more than one partition this happens only for the first partition – which explains why the first partition seems to start before the others. You can’t avoid it happening but you can reduce the impact a little by using encoding hints (see here and here for more details): this process can be skipped for columns that have a Hash encoding hint, or which the engine knows in advance have to use Hash encoding, although it cannot be skipped for columns that have a Value encoding hint. What’s more the Execute SQL event for the first partition will have to complete before the Execute SQL events for all the other partitions can start.

[Thanks to Akshai Mirchandani for the information in this post]

Migration From Analysis Services Multidimensional – Your Feedback Needed!

Do you have Analysis Services Multidimensional cubes in production? Although I know it’s a long time since I last posted any Multidimensional/MDX content here I hope I still have some readers who do. If so, then you may be able to help me.

The reason I ask is that in my current job at Microsoft I’m working with some colleagues to investigate what it is that prevents people from migrating away from Analysis Services Multidimensional to Analysis Services Tabular, Azure Analysis Services, Power BI or indeed any other BI platform. Is it missing features? Is it organisational intertia? Cost? Is it the fact that your Multidimensional cubes still work well and there’s no point in migrating when you wouldn’t see much benefit? Something else? Has the idea of migration ever even crossed your mind?

In particular, what I need is:

  • Examples of Analysis Services Multidimensional cubes you have in production. All I want is the Visual Studio project or an XMLA script of the database, I do not need or want your data. Please leave a message for me here if you’re willing to do this and I’ll let you know where to send your cubes to.
  • Your thoughts on this subject – please leave a comment below. You know how I love a good argument discussion!

I already have plenty of ideas and theories regarding this topic, but what I need is hard evidence (hence the request for the cube definitions) and quotes from actual customers.

Last of all, don’t read too much into this: it’s a research project, nothing more. I can’t comment on, or make any promises about, the future of Multidimensional or new features that might be added to Analysis Services Tabular or Power BI.

[UPDATE November 2020: I’ve now finished my research, so there’s no need to send me your cubes now. Thanks to everyone who did send one so far!]

Power BI Report Builder And RSCustomDaxFilter

If youā€™re building DAX queries using Power BI Report Builder you might notice something that looks like a new DAX function called RSCustomDaxFilter. For example, hereā€™s a simple DAX query built from the Adventure Works Tabular demo database, with one measure and one parameter built Calendar Year:

image

Note that the ā€œEnable Multi Value Parametersā€ option has been selected. Hereā€™s what youā€™ll see if you view the text of the DAX query in the Query Designer:

EVALUATE 
SUMMARIZECOLUMNS(
RSCustomDaxFilter(
@DateCalendarYear,
EqualToCondition,
[Date].[Calendar Year],
Int64
),
"Internet Total Sales",
[Internet Total Sales]
)

But what is RSCustomDaxFilter? If you run your report and see what happens on the server using SQL Server Profiler, hereā€™s the query that actually gets run (in this case I selected the years 2013 and 2014 for the parameter):

EVALUATE
SUMMARIZECOLUMNS (
FILTER (
VALUES ( 'Date'[Calendar Year] ),
( 'Date'[Calendar Year] = VALUE ( "2013" ) )
|| ( 'Date'[Calendar Year] = VALUE ( "2014" ) )
),
"Internet Total Sales", [Internet Total Sales]
)

What has happened is that RSCustomDaxFilter has been replaced with an expression using the DAX Filter() function that implements the filter on the selected years; itā€™s just a placeholder for a dynamically-generated DAX expression that is substituted in at runtime.

Why is it needed? Handling multi-value parameters is difficult in DAX when you donā€™t know how many values are going to be passed to the parameters (itā€™s a subject Iā€™ve blogged about here and here) and some kind of dynamic code generation is a reasonable solution to this problem.

There is one drawback with this approach though ā€“ it can generate a DAX query that is too long to be executed. Hereā€™s the error message youā€™ll see:

image

The specified query is too complex to be evaluated as a single statement.

Iā€™m not sure what the maximum length of a query is in DAX – I suspect itā€™s 32768 characters. Youā€™ll hit this limit if you create a parameter on a field with thousands of values in and then select all but a few of these values; from the example above you can imagine how long the resulting DAX query could be:

image

Thereā€™s no single workaround for this but some custom DAX (such as the example here) will be necessary; youā€™ll just need to ensure that the DAX query is as short as possible.

SSASDiag: A Tool To Help Diagnose Analysis Services Problems

There are a lot of great community-developed tools out there for Analysis Services developers to use (BI Developer Extensions, DAX Studio, Tabular Editor, Analysis Services Query Analyzer to name a few) and they have saved me vast amounts of time and effort over the years. When I joined Microsoft last month I came across one which I had never seen before but which is nevertheless quite mature and feature-rich: the SSAS Diagnostics Tool or SSASDiag for short. Itā€™s available on GitHub here:
https://github.com/ssasdiag/SSASDiag

ā€¦and you can read the documentation here:
https://github.com/ssasdiag/SSASDiag/wiki/SSAS-Diagnostics—Analysis

image

Itā€™s an open source tool developed by the people who support Analysis Services here at Microsoft and is intended to help them collect and analyse the information they need to troubleshoot on-premises SSASĀ  issues, but itā€™s available for anyone to use. I havenā€™t had a chance to take a proper look at it yet myself, unfortunately, but I thought it would be interesting for any SSAS fans out there to check out.

[Thanks to Jon Burchel for providing all the background information for this post]

DAX Median() Function Does Not Work On Tables With More Than 2 Billion Rows

An interesting ā€“ if obscure ā€“ fact I learned recently is that a small number of DAX functions such as Median() do not work on tables with more than 2 billion rows in Analysis Services Tabular, Azure AS and Power BI.

Itā€™s quite easy to reproduce in Power BI. The following M expression returns a table with two billion and four rows:

let

Source =
List.Repeat(
{1,2,3,4},
500000001
),
#"Converted to Table" =
Table.FromList(
Source,
Splitter.SplitByNothing(),
null,
null,
ExtraValues.Error
),
#"Changed Type" =
Table.TransformColumnTypes(
#"Converted to Table",
{{"Column1", Int64.Type}}
)
in
#"Changed Type"

It takes some time to load this table  – around twenty minutes ā€“ but because there are only four distinct values in the table the resulting .pbix file is only 31KB thanks to the way Power BI compresses data.

If you load this table into your dataset, call it VeryBigTable and create the following measure:

Median Test = MEDIAN(VeryBigTable[Column1])

ā€¦and use the measure in a visual, youā€™ll see the following error:

image

The current query cannot be evaluated for the ‘VeryBigTable (42)’ table, because the table contains more than two billion rows.

Whatā€™s more, the error will always occur even if you apply a filter to the table that returns less than two billion rows. The same problem occurs with some other functions, such as Percentile(), but itā€™s worth pointing out that the vast majority of DAX functions work as normal with tables with more than two billion rows ā€“ for example, in the pbix file used here the Sum() and CountRows() functions not only work fine but return instantly.

Luckily, in the case of the Median() function, there is an easy workaround because you can calculate a median in other ways such as the one described on the DAX Patterns site here. The code is a lot more verbose but it works on a 2 billion+ row table.

image

SSAS Tabular 2019, Calculation Groups And Migration From SSAS Multidimensional

With the release of CTP 2.3 of SQL Server 2019 today there was big news for Analysis Services Tabular developers: Calculation Groups. You can read all about them in detail in this blog post:

https://blogs.msdn.microsoft.com/analysisservices/2019/03/01/whats-new-for-sql-server-2019-analysis-services-ctp-2-3/

In my opinion this is the most important new feature in DAX sinceā€¦ well, forever. It allows you to create a new type of calculation ā€“ which in most cases will be a time intelligence like a year-to-date or a previous period growth ā€“ that can be applied to multiple measures; basically the same thing that we have been doing in SSAS Multidimensional for years with the time utility/shell/date tool dimension technique. Itā€™s certainly going to solve a lot of problems for a lot of SSAS Tabular implementations, many of which have hundreds or even thousands of measures for every combination of base measure and calculation type needed.

Iā€™m not going to repeat any of the detailed technical information in the blog post here, though. Instead the point I want to make is that this is very big news for SSAS Multidimensional users too. In the past couple of years many people who have existing SSAS Multidimensional implementations have thought about migrating to SSAS Tabular so they can take advantage of its new features or move to the cloud, and indeed many of them have already migrated successfully. However, up to now, the biggest blocker for those wanting to migrate from Multidimensional to Tabular has been the fact that some complex calculations that can be expressed in MDX cannot be recreated (or recreated easily and efficiently) in DAX, because DAX has not had an equivalent of calculated members not on the Measures dimension or the MDX SCOPE statement.

Calculation groups do not remove this roadblock completely, but they do remove the roadblock for a large group of existing SSAS Multidimensional users whose only complex calculation requirement is a time utility/shell/date tool dimension. As a result these SSAS Multidimensional users will now be able to migrate to SSAS Tabular 2019, Azure Analysis Services or Power BI if they want to. Only those people who have more exotic uses for calculated members not on the Measures dimension (which are not very common at all) and those who use SCOPE statements (a larger group ā€“ many people working with financial data use SCOPE statements heavily) will find that Multidimensional is still the correct platform for them.

How The New IsAvailableInMDX Property For Analysis Services Tabular Can Reduce Memory Usage And Speed Up Processing

Following on from my post the other week about the new RowsetSerializationLimit server property, I thought it would be a good idea to write about why the new IsAvailableInMDX property (announced in the same blog post) is so important. In fact, I would say that everyone using Analysis Services Tabular 2017 (CU7 or higher) or Azure Analysis Services should spend some time investigating it because the potential benefits in terms of reduced memory usage and faster processing times are significant, especially for larger models.

First of all, what does it actually do? As the blog post says, it allows you to stop attribute hierarchies from being built on columns when you donā€™t need them. But what are attribute hierarchies? They are structures that are used only when you are querying your Tabular model using MDX; Excel PivotTables, for example, generate MDX queries when they are connected to Analysis Services Tabular whereas Power BI always generates DAX queries. An attribute hierarchy allows a column on a table to be used on the rows or columns axis of an MDX query, and in Excel that means you will be able to drag that field onto the rows or columns area of a PivotTable. Attribute hierarchies are used by some DAX functionality too – for example the TreatAs() function (at least for now) needs them to be present to work. Frustratingly, the DAX functionality that does need attribute hierarchies is not documented.

To give you an example, consider a Tabular model that contains a table with three columns, Product, Customer and Sales, and a measure that sums up the values in the Sales column.

image

I can query this Tabular model in Power BI, for example by creating a Matrix visualisation:

image

I can also get the same values out using an Excel PivotTable:

image

Now the reason I can create this PivotTable is that Analysis Services Tabular has created attribute hierarchies on the Customer and Product columns. However, the important thing to understand is that Analysis Services Tabular creates attribute hierarchies on every column on every table by default, including the Sales column. This allows me to create a PivotTable like this, with the distinct values from Sales on the rows of the PivotTable:

image

image

Youā€™re probably thinking, why would I ever want to use Sales ā€“ a measure column ā€“ like this? And the answer is you probably wouldnā€™t, even though Tabular allows this by default. Whatā€™s more, building the attribute hierarchy for Sales makes processing slower and the resulting hierarchy uses memory, so all this comes as a cost. The IsAvailableInMDX property is therefore very useful because it allows you to stop attribute hierarchies from being built on columns like Sales where they serve no real purpose.

Unfortunately at the time of writing SSDT doesnā€™t allow you to set the IsAvailableInMDX property but the good news is that the latest versions of Tabular Editor do:

image

Setting IsAvailableInMDX to false for the Sales field has no impact at all in Power BI, so long as you are not using functionality like TreatAs() that needs it. In Excel, it just means that it is no longer possible to drag Sales onto rows or columns in a PivotTable ā€“ the Sales Amount measure still works:

image

As a result, there are two recommendations that can be made:

  • If you are not using any client tools that generate MDX queries (such as Excel) or you want to prevent your users from using them, and you can be sure that it does not affect any of your existing Power BI reports or DAX calculations, you can set IsAvailableInMDX to false on every column of every table
  • If you are using client tools that generate MDX you can still probably set IsAvailableInMDX to false on every measure column and not lose any important functionality

How much of an impact will doing this have on processing times and memory usage? It depends, but it could be a lot. The anecdotal evidence on Twitter is promising:

image

image

I did my own (not particularly scientific) test using a table with five million rows and ten columns, each of which contained the integers between one and five million. Hereā€™s the M query to generate such a table without the need for an external data source:

[sourcecode language='text'  padlinenumbers='true']
let
    Source =
	Table.FromColumns(
		List.Repeat(
			{{1..5000000}},
			10
		)
	),
    #"Changed Type" =
	Table.TransformColumnTypes(
		Source,
		List.Transform(
			Table.ColumnNames(Source),
			each {_, Int64.Type}
		)
	)
in
    #"Changed Type"
[/sourcecode]

On my laptop, with IsAvailableInMDX set to true for all ten columns, a full process on this table took around 105 seconds and the table size reported by Vertipaq Analyzer was 381MB. After changing IsAvailableInMDX to false for all ten columns, the time for a full process went down to around 81 seconds and the table size was down to 191MB.

In summary, this is one of those seemingly obscure technical changes that turns out to be way more useful than you might think. If you test out setting IsAvailableInMDX on your Tabular model, please leave a comment letting me know what kind of impact it had!

[Thanks to Daniel Otykier for providing a lot of information for this post]

UPDATE 2022: the issue with TREATAS was fixed a long time ago, so you should be able to use all DAX functionality when IsAvailableInMDX is turned off.

A New Approach To Handling SSRS Multi-Valued Parameters in DAX Queries

Back in 2012 I wrote a blog post explaining how to handle multi-value parameters in DAX queries in Reporting Services reports. The approach I took back then was basically:

  1. Generate a string containing a pipe-delimited list of all the parameter values that the user has selected (I did this in DAX too, but to be honest itā€™s better to use the SSRS Join function to do this outside the DAX query)
  2. Use the DAX PathContains() function in a Filter() to check whether the value on the current row of the table being filtered appears in the pipe-delimited list

Hereā€™s a deliberately simplified example of how this works based on Adventure Works DW data. The following query filters the FactInternetSales fact table and returns the rows for the Sales Order Numbers that are present in the OrderList variable:

[sourcecode language=’text’ padlinenumbers=’true’]
EVALUATE
VAR OrderList = “SO43713|SO43758|SO43784|SO43821”
RETURN
FILTER (
FactInternetSales,
PATHCONTAINS(OrderList, FactInternetSales[SalesOrderNumber])
)
[/sourcecode]

The trouble with this approach is that is that it can be very slow. Running a trace in DAX Studio for the query above reveals the problem:

image

The presence of CallbackDataID shows that the Storage Engine is calling the Formula Engine to handle the use of PathContains() in the filter, and this is often a cause of poor query performance. However back when I wrote the post the only alternative was, as Chris Koester points out here, to dynamically generate the entire DAX query as an SSRS expression and that is very painful to do.

The good news is that recent changes in DAX mean that there is another way to tackle this problem that can give much better performance. Hereā€™s an example of this new approach:

[sourcecode language=’text’ ]
EVALUATE
VAR OrderList = “SO43713|SO43758|SO43784|SO43821”
VAR OrderCount =
PATHLENGTH ( OrderList )
VAR NumberTable =
GENERATESERIES ( 1, OrderCount, 1 )
VAR OrderTable =
GENERATE (
NumberTable,
VAR CurrentKey = [Value]
RETURN
ROW ( “Key”, PATHITEM ( OrderList, CurrentKey ) )
)
VAR GetKeyColumn =
SELECTCOLUMNS ( OrderTable, “Key”, [Key] )
VAR FilterTable =
TREATAS ( GetKeyColumn, FactInternetSales[SalesOrderNumber] )
RETURN
CALCULATETABLE ( FactInternetSales, FilterTable )
[/sourcecode]

Broken down variable by variable, hereā€™s how it works:

  1. OrderList is the pipe-delimited list of key values passed from SSRS
  2. OrderCount uses the PathLength() DAX function to find the number of parameter values in this list
  3. NumberTable uses the GenerateSeries() function to create a table of numbers with one row for each number between 1 and the number of parameter values in the list
  4. OrderTable uses the trick Marco describes here to iterate over NumberTable and, for each row, uses the PathItem() function to return one parameter value from the list for each row in the able
  5. GetKeyColumn uses the SelectColumns() DAX function to only return the column from OrderTable that contains the parameter values
  6. FilterTable uses the TreatAs() DAX function to take the table of values returned by GetKeyColumn and treat them as values in the FactInternetSales[SalesOrderNumber] column
  7. Finally, the query returns the contents of the FactInternetSales table filtered by the values in FilterTable using the CalculateTable() DAX function

Thereā€™s a lot of extra code here and in some cases you may find that performance with smaller data volumes is worse as a result, but in this particular case the new approach is twice as fast at the old one. Thereā€™s certainly no CallBackDataID:

image

Deprecated And Discontinued Functionality in SSAS 2017

In the past Iā€™ve blogged about deprecated and discontinued functionality in SSAS 2014 and SSAS 2016; I forgot to check whatā€™s deprecated and discontinued in SSAS 2017 until last week but it turns out that there are a few things that are worth knowing.

Hereā€™s the link to the official documentation:

https://docs.microsoft.com/en-us/sql/analysis-services/analysis-services-backward-compatibility-sql2017?view=sql-analysis-services-2017

ā€¦and here are the definitions of ā€˜deprecatedā€™ and ā€˜discontinuedā€™:

A deprecated feature will be discontinued from the product in a future release, but is still supported and included in the current release to maintain backward compatibility. It’s recommended you discontinue using deprecated features in new and existing projects to maintain compatibility with future releases.

A discontinued feature was deprecated in an earlier release. It may continue to be included in the current release, but is no longer supported. Discontinued features may be removed entirely in a future release or update.

As far as discontinued features go itā€™s straightforward: everything that was deprecated in SSAS 2016 is now discontinued. For SSAS MD that means remote partitions, remote linked measure groups, dimension writeback and linked dimensions are now discontinued; I donā€™t think these features were ever used by more than a small number of people. Profiler is discontinued too and thatā€™s more of a problem, given that the UI for Extended Events in SSMS remains awful and unusable for the kind of query performance tuning tasks I use Profiler for (I blogged about this issue here). The state of tooling for SSAS is already pretty bad and if Profiler stops working in the future the situation will be even worse; is it right that we have to rely on community-developed tools like DAX Studio and Analysis Services Query Analyzer, however good they are, for tasks like performance tuning?

UPDATE 30th April 2018: it turns out that Profiler was put on the ‘discontinued’ list by accident, and in fact is still only deprecated. The documentation has now been updated appropriately.

There are two important deprecated features:

  • SSAS Multidimensional data mining. Given that it has not had any new features now for a long, long time (even longer than the rest of SSAS MD) and was never very popular in the first place, Iā€™m not surprised. However the example of Microsoftā€™s first, failed attempt at brining data mining to a wider audience is interesting in the light of the companyā€™s attempts to do the same thing with Azure Machine Learning and other services. As far as I understand it the technology was never the problem and it was about as easy to use as it could be, so why did it fail? Iā€™m not the right person to answer this question but I suspect the reasons include the following: Microsoft BI customers were not ready for data mining back when it was first launched; customers who did want data mining didnā€™t want to buy a product from Microsoft; very few Microsoft partners had the skills or experience to sell it; and finally is it even possible to do proper data science in a user-friendly GUI with no coding?
  • SSAS Tabular models at the 1100 and 1103 compatibility level (for SSAS 2012 and SSAS 2012 SP1). Anyone that is still running Tabular models at this compatibility level really needs to upgrade, because theyā€™re missing out on the great new features that have appeared in SSAS 2016 and 2017.