Excel PivotTable Query Performance With Analysis Services, Power Pivot And Power BI (Revisited)

Back in 2016 I wrote the following blog post about changes to the way Excel 365 generated MDX queries for PivotTables connected to Analysis Services, Power Pivot/the Excel Data Model and Power BI datasets:

https://blog.crossjoin.co.uk/2016/07/08/excel-2016-pivottable-mdx-changes-lead-to-big-query-performance-gains/

I know it sounds boring and not something you need to worry about but trust me, this is important – these changes solved the vast majority of Excel PivotTable performance problems that I encountered when I was a consultant so you should read the above post before continuing.

Unfortunately, earlier this year these changes had to be partially rolled back because in some rare cases the queries generated returned incorrect results; this means that you may find that values for subtotals and grand totals are again being returned even when they aren’t being displayed. The good news is that you should still be able to get the improved performance with a few minor tweaks.

Using the example from the previous post, a PivotTable connected to SSAS MD and the Adventure Works cube (the queries generated for SSAS Tabular, Power Pivot and Power BI may be slightly different and slightly better but the basic problem is the same), if you build the following in Excel:

image

image

…even though the subtotals in the PivotTable are not displayed in Excel, the MDX query generated by Excel not only returns them but also returns others that are not needed at all – in fact 36 rows (although the query returns them as columns) of data rather than the 13 rows that are displayed. Here’s a screenshot of the results returned when you run the MDX query in SQL Server Management Studio:

image

image

Any time you see a row or column containing an All Member (in this case All Customers or All Products) you know you are looking at a subtotal or grand total.

[NB The easiest way to get the MDX query generated by your PivotTable is to install the OLAP PivotTable Extensions add-in https://olappivottableextensions.github.io/]

In this case, to get an efficient query, you need to explicitly turn off subtotals and grand totals for the PivotTable in Excel:

image

image

In the PivotTable the only difference you’ll see is that the grand totals are now not displayed:

image

But the query generated now only returns the values that are actually needed, and as a result will be a lot more efficient and potentially a lot faster. In this case, the query now only returns the 12 rows of data displayed:

image

image

Even with subtotals and grand totals turned off there are still some cases where unwanted values will be returned. Take the following PivotTable, where I have used the Calendar hierarchy from the Date dimension on columns and filtered it so I only see the three months in Q1 CY 2012:

image

image

The MDX query generated by Excel requests requests four extra columns with subtotals and a grand total that aren’t displayed:

image

There is a fairly simple workaround though. By changing how the PivotTable is constructed – in this case by not putting a hierarchy on columns but just the Month Of Year attribute, then adding slicers for Calendar Year and Calendar Quarter to control which months are displayed – you can get an efficient query. This version of the PivotTable:

image

image

…results in this query being generated, which only returns the required values:

image

To sum up, then, if you have a PivotTable that seems a bit slow:

  • Turn off subtotals and grand totals unless you really need to see them
  • Look at the MDX query being generated by Excel and see if it is still returning unnecessary subtotals and grand totals, and if it is try rebuilding the PivotTable to get the same results in a different way (for example by not drilling down on hierarchies as in the last example) to get a more efficient query

[All the queries in this post were generated by Excel 365 version 1910 build 12130.20238]

The AutoSetDefaultInitialCatalog Analysis Services Server Property

In Shabnam Watson’s recent blog post on a bug she found when trying to create a Live connection from Power BI to Analysis Services she mentioned that the AutoSetDefaultInitialCatalog server property could be used to solve her problem. This piqued my interested because I’d seen this property but had no idea what it did exactly or why it was there. Luckily, now I work for Microsoft, it’s even easier for me to find out about things like this from the dev team and Akshai Mirchandani was able to help.

First of all, what does it do? The documentation on this property has just been added here, and this is what it says:

AutoSetDefaultInitialCatalog
A Boolean property. When set to true, new client connections automatically default to the first catalog (database) the user has permissions to connect to.
When set to false, no initial catalog is specified. Clients must select a default catalog prior to running queries or discover operations against a database on the server. If no default catalog is specified, an error is returned. If Initial Catalog property is specified in the connection string, the default catalog will be applied from this property.

The default value for this property is true.

Let me illustrate what this means. Say you have an instance of Analysis Services (in this case it’s Tabular, but it could be Multidimensional) with two databases on it:

image

I’ve expanded the Roles tab for each database reasons that will become clear later.

Next, let’s say you run a simple trace on this server looking at the Discover End and Session Initialize events:

image

…and while this trace is running, you open up SQL Server Management Studio and connect to the SSAS instance. Here’s what you see in Profiler:

image

Now, just to be clear, all I did was open up SQL Server Management Studio and connect to the instance. I did not open up a DAX query window or anything like that; all that happened was the list of databases on the instance was displayed in the Object Explorer pane.

image

The interesting thing to notice from the trace above is that when I did that there are five Session Initialize events and even though the Database column in Profiler is blank, you can see from the list of role names in the TextData column that in each case a connection has been made to the Adventure Works Internet Sales database.

This is because when you open a connection to Analysis Services and do not set the Initial Catalog connection string property, what happens is that you will get a connection to the default database on the instance. Which database is the default? It’s just the first database that the user has permission to access on the instance, which is a bit random.

This happens at other times too. Let’s say you right click on the EmptyDB database and process it in SQL Management Studio:

image

Here’s what I see in Profiler:

image

In this case there are three connections to the default database, Adventure Works Internet Sales, when the database I am processing is EmptyDB!

Most of the time these unnecessary connections have no impact at all but sometimes they can cause problems such as the ones Shabnam describes in her blog post. For example:

  • It can cause performance problems, because there is an overhead to opening a connection – for example roles are evaluated when a connection is opened
  • Monitoring and auditing gets complicated because, as you can see from the traces above, there are a whole lot of connections to the default database taking place that you aren’t expecting
  • Most importantly, when a connection is opened a read-commit lock is acquired on that database and in a few rare cases this can cause deadlocks and other locking-related issues

This is why the AutoSetDefaultInitialCatalog server property was introduced. With this server property set to False, when you open a connection to SSAS with no Initial Catalog set, then you get a connection with no database set. You can find this server property in SQL Server Management Studio in the Analysis Server properties dialog (which you can find by right-clicking on your instance name, selecting Properties, and going to the General tab) and checking the Advanced (All) Properties box.

image

With AutoSetDefaultInitialCatalog set to False, here’s what Profiler shows when I rerun my original test of connecting to SQL Server Management Studio:

image

Note that there are now no Session Initialize events now.

Here’s what opening up a new MDX query window in SQL Management Studio shows with AutoSetDefaultInitialCatalog set to False if you don’t explicitly set a database when you connect:


image

image

Note the empty database dropdown box on the toolbar and the “Error loading metadata: No cubes were found” error message shown in the Metadata pane.

So why didn’t the dev team set AutoSetDefaultInitialCatalog to False by default on new instances? The problem with doing this is that it is a potential breaking change that could cause errors in some client tools. I’m not aware of any specific cases where this might happen but if you did decide to change AutoSetDefaultInitialCatalog to False on your instance you would need to test thoroughly to make sure it didn’t break anything. My feeling is, though, it is probably a good idea to AutoSetDefaultInitialCatalog to False on production servers and do the appropriate testing just in case those unnecessary connections are causing problems.

Setting Azure Analysis Services Server Properties Not Visible In SQL Server Management Studio

Users of on-premises Analysis Services will know that most of the useful server properties can be set in SQL Server Management Studio, some (such as MaxIntermediateRowsetSize) can only be set by editing the msmdsrv.ini file. How do you set these properties in Azure Analysis Services though, when there is no msmdsrv.ini file to edit?

The solution is to use an XMLA script to make the change. The easy way to do this is to open up the server properties dialog in SQL Management Studio by right clicking on your instance name in the Object Explorer and selecting Properties:

image

Then, in the server properties dialog, change any server property but do not click ok. Instead, click on the Script button and then select Script Action to New Query Window:

image

This will create a new XMLA query window in SSMS (the connection dialog for this window will be open too, which will freeze the server properties dialog, so you’ll need to either connect or dismiss the dialog to close the server properties dialog) with the XMLA script to make the server properties change you made. The actual change won’t take place, though, unless you execute the script – so don’t do that.

Instead, change the name of the server property in the script to the one you actually want to set and enter the value you want to set it to:

image

Note that you can’t just enter the name of the server property in most cases because server properties can be grouped into sections, so you’ll need to enter the section names too. For example for the MaxIntermediateRowsetSize property you’ll need to enter DAX\DQ\MaxIntermediateRowsetSize.

After that, all you need to do is hit the Execute button and the change will be made.

DAX Median() Function Does Not Work On Tables With More Than 2 Billion Rows

An interesting – if obscure – fact I learned recently is that a small number of DAX functions such as Median() do not work on tables with more than 2 billion rows in Analysis Services Tabular, Azure AS and Power BI.

It’s quite easy to reproduce in Power BI. The following M expression returns a table with two billion and four rows:

let
    Source = 
    List.Repeat(
        {1,2,3,4},
        500000001
        ),
    #"Converted to Table" = 
    Table.FromList(
        Source, 
        Splitter.SplitByNothing(), 
        null, 
        null, 
        ExtraValues.Error
        ),
    #"Changed Type" = 
    Table.TransformColumnTypes(
        #"Converted to Table",
        {{"Column1", Int64.Type}}
        )
in
    #"Changed Type"

It takes some time to load this table  – around twenty minutes – but because there are only four distinct values in the table the resulting .pbix file is only 31KB thanks to the way Power BI compresses data.

If you load this table into your dataset, call it VeryBigTable and create the following measure:

Median Test = MEDIAN(VeryBigTable[Column1])

…and use the measure in a visual, you’ll see the following error:

image

The current query cannot be evaluated for the ‘VeryBigTable (42)’ table, because the table contains more than two billion rows.

What’s more, the error will always occur even if you apply a filter to the table that returns less than two billion rows. The same problem occurs with some other functions, such as Percentile(), but it’s worth pointing out that the vast majority of DAX functions work as normal with tables with more than two billion rows – for example, in the pbix file used here the Sum() and CountRows() functions not only work fine but return instantly.

Luckily, in the case of the Median() function, there is an easy workaround because you can calculate a median in other ways such as the one described on the DAX Patterns site here. The code is a lot more verbose but it works on a 2 billion+ row table.

image

SSAS Tabular 2019, Calculation Groups And Migration From SSAS Multidimensional

With the release of CTP 2.3 of SQL Server 2019 today there was big news for Analysis Services Tabular developers: Calculation Groups. You can read all about them in detail in this blog post:

https://blogs.msdn.microsoft.com/analysisservices/2019/03/01/whats-new-for-sql-server-2019-analysis-services-ctp-2-3/

In my opinion this is the most important new feature in DAX since… well, forever. It allows you to create a new type of calculation – which in most cases will be a time intelligence like a year-to-date or a previous period growth – that can be applied to multiple measures; basically the same thing that we have been doing in SSAS Multidimensional for years with the time utility/shell/date tool dimension technique. It’s certainly going to solve a lot of problems for a lot of SSAS Tabular implementations, many of which have hundreds or even thousands of measures for every combination of base measure and calculation type needed.

I’m not going to repeat any of the detailed technical information in the blog post here, though. Instead the point I want to make is that this is very big news for SSAS Multidimensional users too. In the past couple of years many people who have existing SSAS Multidimensional implementations have thought about migrating to SSAS Tabular so they can take advantage of its new features or move to the cloud, and indeed many of them have already migrated successfully. However, up to now, the biggest blocker for those wanting to migrate from Multidimensional to Tabular has been the fact that some complex calculations that can be expressed in MDX cannot be recreated (or recreated easily and efficiently) in DAX, because DAX has not had an equivalent of calculated members not on the Measures dimension or the MDX SCOPE statement.

Calculation groups do not remove this roadblock completely, but they do remove the roadblock for a large group of existing SSAS Multidimensional users whose only complex calculation requirement is a time utility/shell/date tool dimension. As a result these SSAS Multidimensional users will now be able to migrate to SSAS Tabular 2019, Azure Analysis Services or Power BI if they want to. Only those people who have more exotic uses for calculated members not on the Measures dimension (which are not very common at all) and those who use SCOPE statements (a larger group – many people working with financial data use SCOPE statements heavily) will find that Multidimensional is still the correct platform for them.

Azure Data Studio Should Support Analysis Services And Power BI Premium Capacities

I’m at the PASS Summit this week, and in this morning’s keynote there was a demo of the newly-released Azure Data Studio  – a modern, cross-platform tool for managing and querying SQL Server, Azure SQL Database and other Azure data services (it’s carefully described as “complementary to” SQL Server Management Studio rather than a replacement for it; this blog post has a detailed discussion of this question).

This video is provides a good, short overview of what it is:

I think it’s pretty cool, BUT… it doesn’t support Analysis Services. I had a moan about this and the generally poor state of Analysis Services tooling on Twitter, was invited to meet some of the developers and was told that if enough people request Analysis Services support it might happen.

What would support for Analysis Services involve? The following springs to mind:

  • I’d like to be able to connect to and manage Analysis Services Multidimensional and Tabular on-premises and Azure Analysis Services; if that’s too ambitious I could settle for supporting only Analysis Services Tabular 2016+ and Azure Analysis Services.
  • Since we will soon be able to connect to a Power BI Premium capacity as if it was an Analysis Services instance via XMLA endpoints, I would want to be able to connect to Power BI Premium capacity too.
  • I’d want to be able to run DAX and M queries, and ideally MDX queries too.
  • I would also want to be able to work with ASSL and TMSL for scripting and editing objects.
  • Azure Data Studio has a Profiler extension that works on xEvents; it would be great if that worked with Analysis Services xEvents too.
  • DAX and M Jupyter notebooks would be really useful!
  • It would make sense for some of the functionality of existing tools like DAX Studio and BISM Normalizer being turned into extensions.

If you want to see Analysis Services support in Azure Data Studio, go to the following issue on the Azure Data Studio GitHub repository:

https://github.com/Microsoft/azuredatastudio/issues/1026

…and click the thumbs-up icon on the first post:

AzureDataStudio

Let’s make our voices heard!

 

 

A Quick Look Some Power BI And SSAS-Related Products And Books

I  don’t like writing reviews of books or products here on my blog for a couple of reasons, the main one being that I don’t usually have the time to read/test/understand something properly so I can write a thorough review. That said I do get sent a lot of free books and evaluation licences for products that deserve a wider audience, so I thought I would write a post rounding up some of them along with a few thoughts of my own.

Custom Visuals

One of the most interesting questions related to Power BI is whether third-party software companies will be able to build businesses selling extensions to it. The most obvious way that Power BI can be extended is through custom visuals and there are several companies that have paid-for (as opposed to free) custom visuals. Zebra BI is one such company and I’ve been really impressed by what they have produced for visualising financial data:

image

I also saw recently that OKViz (part of the Marco and Alberto/SQLBI family) now have a paid-for version of their excellent Smart Filter visual with some premium features – see here for more details; similarly new features in Klaus Birringer’s Ultimate Waterfall and Ultimate Decomposition Tree visuals are only available in the paid version.

I know many Power BI users who use custom visuals have suffered with various bugs and limitations in functionality over the past few users, and I think buying commercial custom visuals rather than relying on free equivalents is one way of dealing with reliability and support problems. It’s certainly in Microsoft’s interests to have a thriving partner community in this space given that flashy visuals are a major selling point of the product. But will Power BI users want to pay for visuals when so much is available out of the box for free, especially when the cost of the visuals seems relatively high when compared to the overall cost of Power BI? I guess we’ll see.

Custom Connectors

A lot of what I’ve just said about custom visuals also applies to custom connectors, although custom connectors are a lot less mature (at the time of writing, support for custom connectors in the on-premises gateway is still in preview). However I was pleased to see this announcement from CData software that they now have over 100 custom connectors available for Power BI. It looks like what they have done is wrapped their existing ODBC providers, and as a result some of their connectors are for sources that are already available in Power BI, but even so there are a lot of new data sources here.

Incidentally, I got very, very excited when I realised that the CData connectors for Excel and Excel Online supported DirectQuery mode as well as import mode. Why, I hear you ask? Well, just think about a planning/budgeting solution where users can enter data into an Excel spreadsheet and when the numbers change in Excel, the numbers change in Power BI too; think also how this could work with Composite Models. I tried this with CData’s Excel on-premises connector and unfortunately it returned errors when the source Excel worksheet was open; I did get it to work with the Excel Online connector but it was painfully slow, even with a small amount of data. If I can get it to work better (and I may be missing some optimisations within the connector) I’ll blog about it.

Books

One of the few Power BI-related books that have been published recently is Phil Seamark’s “Beginning DAX with Power BI”. He was kind enough to send me a review copy; it’s a good introduction to the subject and I particularly like the way he introduces DAX variables early one. Definitely worth a look if you’re just starting to learn DAX.

I was also sent a copy of a slightly older book, David Parker’s “Mastering Data Visualization with Visio 2016”. David knows pretty much all there is to know about using Visio for BI (his blog is great) and while this book doesn’t cover the most exciting new development in this area – the Visio custom visual for Power BI – if you want to learn all the advanced features of Visio that you could take advantage of in Power BI then this is the book to get.

Other Products

I’ve been a big fan of SentryOne’s SSAS monitoring tool, BI Sentry, for years now but up until recently it only supported SSAS Multidimensional. It now supports SSAS Tabular too (details here), and it looks like SentryOne have done a great job of adapting it to the specific needs of the Tabular engine. I always advise my SSAS customers to invest in some kind of monitoring solution because it makes the job of detecting and solving issues like poor query performance so much easier, and to be honest BI Sentry is better than anything you would be able to build yourself.

Moving onto Power BI, if you need to generate documentation for your Power BI datasets and reports check out Power BI Documenter; the August release looks like it has some cool new features. Alternatively the latest release of Power BI Helper also allows you to generate documentation as well as lots of other useful stuff.

Something that hasn’t been properly released yet, but will be incredibly useful when it is, is MAQ Software’s Application Lifecycle Management Toolkit for Power BI. Closely related to BISM Normalizer, it will allow you to compare two Power BI datasets, merge changes, deploy only parts of a dataset (for example individual measures), and deploy to multiple datasets – all of which are things Power BI developers have been crying out for.

Last of all, the guys at DevScope also have a new(ish) product out, Power BI Robots, which automatically takes screenshots of Power BI reports and dashboards and can deliver them to various destinations such as email address and SharePoint. I haven’t looked at it yet but it seems like it could have a lot of interesting uses.

How The New IsAvailableInMDX Property For Analysis Services Tabular Can Reduce Memory Usage And Speed Up Processing

Following on from my post the other week about the new RowsetSerializationLimit server property, I thought it would be a good idea to write about why the new IsAvailableInMDX property (announced in the same blog post) is so important. In fact, I would say that everyone using Analysis Services Tabular 2017 (CU7 or higher) or Azure Analysis Services should spend some time investigating it because the potential benefits in terms of reduced memory usage and faster processing times are significant, especially for larger models.

First of all, what does it actually do? As the blog post says, it allows you to stop attribute hierarchies from being built on columns when you don’t need them. But what are attribute hierarchies? They are structures that are used only when you are querying your Tabular model using MDX; Excel PivotTables, for example, generate MDX queries when they are connected to Analysis Services Tabular whereas Power BI always generates DAX queries. An attribute hierarchy allows a column on a table to be used on the rows or columns axis of an MDX query, and in Excel that means you will be able to drag that field onto the rows or columns area of a PivotTable. Attribute hierarchies are used by some DAX functionality too – for example the TreatAs() function (at least for now) needs them to be present to work. Frustratingly, the DAX functionality that does need attribute hierarchies is not documented.

To give you an example, consider a Tabular model that contains a table with three columns, Product, Customer and Sales, and a measure that sums up the values in the Sales column.

image

 

I can query this Tabular model in Power BI, for example by creating a Matrix visualisation:

image

I can also get the same values out using an Excel PivotTable:

image

Now the reason I can create this PivotTable is that Analysis Services Tabular has created attribute hierarchies on the Customer and Product columns. However, the important thing to understand is that Analysis Services Tabular creates attribute hierarchies on every column on every table by default, including the Sales column. This allows me to create a PivotTable like this, with the distinct values from Sales on the rows of the PivotTable:

image

image

You’re probably thinking, why would I ever want to use Sales – a measure column – like this? And the answer is you probably wouldn’t, even though Tabular allows this by default. What’s more, building the attribute hierarchy for Sales makes processing slower and the resulting hierarchy uses memory, so all this comes as a cost. The IsAvailableInMDX property is therefore very useful because it allows you to stop attribute hierarchies from being built on columns like Sales where they serve no real purpose.

Unfortunately at the time of writing SSDT doesn’t allow you to set the IsAvailableInMDX property but the good news is that the latest versions of Tabular Editor do:

image

Setting IsAvailableInMDX to false for the Sales field has no impact at all in Power BI, so long as you are not using functionality like TreatAs() that needs it. In Excel, it just means that it is no longer possible to drag Sales onto rows or columns in a PivotTable – the Sales Amount measure still works:

image

As a result, there are two recommendations that can be made:

  • If you are not using any client tools that generate MDX queries (such as Excel) or you want to prevent your users from using them, and you can be sure that it does not affect any of your existing Power BI reports or DAX calculations, you can set IsAvailableInMDX to false on every column of every table
  • If you are using client tools that generate MDX you can still probably set IsAvailableInMDX to false on every measure column and not lose any important functionality

How much of an impact will doing this have on processing times and memory usage? It depends, but it could be a lot. The anecdotal evidence on Twitter is promising:

image

image

I did my own (not particularly scientific) test using a table with five million rows and ten columns, each of which contained the integers between one and five million. Here’s the M query to generate such a table without the need for an external data source:

let
    Source =
	Table.FromColumns(
		List.Repeat(
			{{1..5000000}},
			10
		)
	),
    #"Changed Type" =
	Table.TransformColumnTypes(
		Source,
		List.Transform(
			Table.ColumnNames(Source),
			each {_, Int64.Type}
		)
	)
in
    #"Changed Type"

On my laptop, with IsAvailableInMDX set to true for all ten columns, a full process on this table took around 105 seconds and the table size reported by Vertipaq Analyzer was 381MB. After changing IsAvailableInMDX to false for all ten columns, the time for a full process went down to around 81 seconds and the table size was down to 191MB.

In summary, this is one of those seemingly obscure technical changes that turns out to be way more useful than you might think. If you test out setting IsAvailableInMDX on your Tabular model, please leave a comment letting me know what kind of impact it had!

[Thanks to Daniel Otykier for providing a lot of information for this post]

More Details On The RowsetSerializationLimit Server Property For Analysis Services

The recent blog post on the Analysis Services team blog about new memory options contained information about some very interesting functionality that has just been added. The first of the new features I decided to try was the RowsetSerializationLimit server property, which restricts the number of rows returned by a query and which is intended to stop rogue users running queries that return very large amounts of data. It certainly works but there are two things to point out about it that aren’t immediately obvious.

First of all, an example of it in action. In Azure Analysis Services, using a database that contains one very simple table with one column and 5000 rows, the following DAX query returns all 5000 rows from the table:

EVALUATE 'MyTable'

image

To change the RowsetSerializationLimit server property, open SQL Server Management Studio, right-click on your server name in the Object Explorer pane and select Properties. Then go to the General page, check the Show Advanced (All) Properties box, and you’ll see the property listed under OLAP\Query\RowsetSerializationLimit:

image

The default value is –1, meaning no limit is placed on the number of rows returned by a rowset. Changing it to 100 and then running the above query results in the following error:

image

Executing the query …
Obtained object of type: Microsoft.AnalysisServices.AdomdClient.AdomdDataReader
The maximum number of rows (100) was reached.

A query that returns less than 99 rows, for example

EVALUATE TOPN(99,'MyTable')

…is ok though.

The important thing to understand is that although this applies to both DAX and MDX queries, it only applies to queries that are returned as rowsets (a tabular format) and not as cellsets (a multidimensional format). Most client tools that generate MDX, including Excel, use cellsets so this property will not apply to them unfortunately.

For example, the following MDX query run from a SQL Server Management Studio MDX query window (which also returns a cellset) on the same database runs ok and returns 5000 rows:

SELECT
{[Measures].[Number Of Rows]} ON 0,
[MyTable].[Column1].[Column1].MEMBERS ON 1
FROM
[Model]

 

image

However the same query run from Power BI Desktop to import data from the same database:

image

…runs, but returns only 100 rows and then an error:

image

Something else to point out is that this applies to all queries that return rowsets, even Discover requests and DMV queries. As a result, setting this property to a very low value can cause problems in tools like SQL Server Management Studio: for example if you set the property to 10 and you had 11 tables in a database, you would see an error when you tried to expand the Tables node of a database in the Object Explorer pane!

Even though this property only applies to rowsets I think it’s still going to be very useful, especially in scenarios where Power BI Desktop users are importing vast amounts of data from Analysis Services and causing major problems on the server.

How Much Does Azure Analysis Services Actually Cost?

It might seem strange to write a blog post on how much Azure Analysis Services costs when there is a page on the Microsoft site that very clearly lists all of the SKUs and how much they cost per hour or per month in any currency and Azure region you choose:

https://azure.microsoft.com/en-gb/pricing/details/analysis-services/

The prices you see here are somewhat misleading though. They’re not wrong, but they do not challenge the assumption that you will  use Azure Analysis Services in the same way as an on-premises instance and therefore they allow you to make further, incorrect assumptions about cost. A lot of my customers look at the monthly cost and assume that’s what they will actually be paying monthly – and more often than not come to the conclusion that Azure Analysis Services is too expensive for them. In fact the situation is a lot more complex and Azure Analysis Services (especially at the enterprise level) might end up being cheaper than you think. Let me explain why.

The workload of any Analysis Services instance, whether on-premises or in Azure, varies a lot. For example:

  • It will be busy during office hours while users are running queries but much quieter at night when most users have gone home
  • Similarly, it will be busy during the week when most users are working and much quieter on weekends and public holidays
  • It will be busy at certain times, such as month-ends or Black Friday, when more reports need to be run
  • It will be much busier (often with both CPU and memory usage at their peak) when processing is taking place

When you are planning an on-premises deployment of Analysis Services you need to specify your hardware and licensing so as to be able to handle these periods of high usage, even if for most of the time usage is a lot lower. This means that enterprise-level deployments of Analysis Services can be expensive because you need servers with a large number of cores and a lot of RAM and you may also need to use network load-balancing to scale out over several servers.

On the other hand Azure Analysis Services is able to scale up and scale out on demand, and you only pay for what you use. Scaling up means moving to a higher performance level (ie a SKU) within a service tier, or even moving up a service tier. Scaling out means adding replicas of your existing Azure AS instance and database.

Broadly speaking you need to scale up in two scenarios:

  • To handle the need for more memory and more QPUs while you are processing
  • To handle increased data volumes, either as a result of new tables being added to the model or because the size of existing tables has increased over time

You need to scale out when:

  • You need to process during times when other users are running queries, to ensure that query performance is not affected
  • You need to handle an increased number of concurrent users running queries

Bill Anton has an excellent blog post covering this question in a lot more detail here:

http://byobi.com/2017/11/when-to-scale-up-or-scale-out-with-azure-analysis-services/

One other thing to point out is that if you pause an instance of Azure Analysis Services you pay absolutely nothing.

What does this mean for the cost of Azure Analysis Services? Basically, if you’re taking advantage of these features you won’t pay one of the monthly prices quoted on the pricing page linked to at the top of this post. Instead you may do things like:

  • Scale up for one hour every day when you need to process your SSAS database, just to get the extra memory and QPUs needed, then scale down when processing has finished
  • Scale out only on certain days, or certain times of day, to handle increased numbers of users
  • Pause your instance when you are sure that no-one needs to run queries

How do you then calculate the likely cost? For my Azure Analysis Services precon at SQLBits a few months ago I built an Excel workbook that shows how to go about this. First, there is a table with the hourly costs for S-level instances in GBP:

image

Then there is a table with one cell for every hour of every day of the week, with the performance level required for that hour. In this example most of the time an S1 instance is required except for at midnight every night where processing is taking place and an S2 is needed. For four hours on Sunday morning the instance is paused.

image

Next there is a similar table showing the number of scale-out replicas needed for each hour of each day. In this case scale-out is needed for four hours of the day, Monday to Friday, to handle a larger number of concurrent users:

image

Next, the data from these three tables is brought together to calculate the cost per hour for each day of the week:

image

Finally, for a given month the actual cost per day can be calculated (which of course varies by the number of weekdays, weekends and maybe also public holidays), resulting in the true monthly cost:

image

In this example the monthly cost for January 2018 is £1287.56, which is only slightly more than the £1104.48 you would pay to have an S1 instance for a whole month. You can download a copy of the workbook here.

In summary, my point here is that pricing an implementation of Azure Analysis Services is complex because of its flexibility. I am not saying that Azure Analysis Services is cheap, or cheaper than using Analysis Services on a VM in Azure or on premises – that’s a subject for a completely separate discussion. Hopefully, though, this post gives you a better idea of how much you might pay if you do use Azure Analysis Services for a project.

UPDATE 10th June 2017: the number of query replicas for scale-out has been limited by Azure region, so choosing the right Azure region is very important if you do need to scale out. The documentation is here: https://docs.microsoft.com/en-us/azure/analysis-services/analysis-services-overview#availability-by-region

%d bloggers like this: