SQLBits 7 Registration Open

If you haven’t heard already (and I’m at least twelve hours late on this news) then the big news is that registration has opened for SQLBits 7. The conference will be taking place from Thursday September 30th to Saturday October 2nd in York, in the UK; to find out more about what’s going on, and to register, go to http://www.sqlbits.com/. At the time of writing 116 people have registered already, and we haven’t even published the agendas for the Friday or Saturday yet…!

I’m going to be running one of the training days that will be taking place on Thursday 30th too: a whole day on performance tuning Analysis Services 2008. I’ll be covering designing aggregations, partitioning, caching and cache warming, tuning queries and calculations, and hardware configuration; basically it will be a brain dump of everything I know on the subject and at a very modest price too…

The Deployment Wizard, and when “Retain Partitions” doesn’t retain partitions

I learned something the hard way this week about how the Deployment Wizard actually works (thank goodness for daily backups) that I thought I’d share.

I’ve used the Deployment Wizard lots of times to deploy changes to a SSAS database in production. If you have a system where new partitions are generated dynamically, for example in an SSIS package that creates new partitions automatically when new data is loaded into the cube, then these partitions are not going to be present in your project in BIDS and you don’t want them to be overwritten when you deploy the new version of your database. The Deployment Wizard allows you to stop this happening, and also to not overwrite other setting that might be different in dev and prod, such as connection strings and security roles.

However, what I didn’t realise was that in order for the Deployment Wizard to work properly in all cases it needs to be able to see the server you want to deploy to, and in my current customer’s environment you can’t see prod from dev. Working on dev, what I did was to edit the .deploymenttargets input file manually to enter the name of the database in prod I wanted to deploy to, then ran the wizard with the /a argument (see here for more information on these settings) to set other options in the input file such as Retain Partitions, then ran the wizard again with the /o and /d argument to generate the xmla deployment script without needing to connect to the target server (which I couldn’t, of course, see). And then, when I ran the resulting script in prod, I saw all the partitions that weren’t in dev disappear even though I’d selected the Retain Partitions option.

What I had assumed was that by selecting the Retain Partitions option the wizard would somehow generate the XMLA Alter command it outputs so that it would just ignore the partitions in each measure group. However this is not correct: you can’t have an Alter command for a database that doesn’t list all the partitions in all the measure groups in the database. This is why the wizard needs to connect to the target server: it scripts out all the partitions that currently exist on the target and inserts them into the script it generates, instead of just the partitions that are in the .asdatabase file. If, though, you use the /d argument you’re telling the wizard to use only the information that’s in the input files and not to connect to the target server to see what’s there, so it can’t know what partitions are present on the target server and it has to use the partitions from the .asdatabase file instead. So, if you use the /d argument, even if you specify the Retain Partitions option you may see partitions deleted on the target when you run the wizard’s script.

Building a Better Cache-Warmer, Part 2: The Formula Engine Cache

Sorry for the very long delay – at long last, here’s the companion piece to the post I wrote last September about cache-warming. That post dealt with the relatively straightforward topic of warming the Storage Engine cache; this time we’ll be looking at how to warm the Formula Engine cache. As I promised in the first post, the idea is that I’m going to take the ideas discussed here and use them to create a better cache warmer than the ones I’ve blogged about in the past.

The first thing to note when talking about caching the result of MDX calculations is that it does happen – a surprising number of people think that Analysis Services performs every MDX calculation needed by a query from scratch every time. This does indeed happen in worst-case scenarios – and it’s all too easy to do things which accidentally get you into these worst-case scenarios – but when SSAS can cache the result of calculations it can have a massive impact on query performance.

The second thing to note is that when I say that SSAS can cache the result of calculations, what I mean is that it can cache the values returned by cells in the cube which have been subject to some form of MDX calculation. Analysis Services can’t cache the result of MDX expressions as such, it can only do so when the result of those expressions is surfaced as a value returned by a cell somewhere in the cube. This means that if we create a calculated measure on the cube’s MDX Script, and then include this calculated measure in a query, we can expect that SSAS should be able to cache the values returned by this calculated measure. On the other hand if we have a complex MDX set expression on the Rows axis of our query, the set that expression returns cannot be cached – it will be re-evaluated every time the query is run.

The third, and possibly most important thing to remember is that SSAS can only cache the results of a calculation for as long as the calculation actually exists. As you may already know, there are three places you can create a calculated member in SSAS, and these equate to three different ‘lifetimes’. If you create a calculated measure on the MDX Script of the cube it will live until the next time the cube’s MDX Script is dropped and recreated, and this will happen if you explicitly clear the cache or you do any kind of processing. This is called ‘global’ scope. If, however, you create a calculated member in the WITH clause of a query then that calculated member will only exist for the lifetime of that query – so while you will benefit from caching while the query is executing, after the query has returned the calculated member will disappear and so will any related cache. This is called ‘query’ scope. The third scope, or lifetime, for creating a calculated member is ‘session’ scope: you can create calculated members that live for the lifetime of a session using the CREATE MEMBER statement, so values can only be cached for one user and one session; this is used only very rarely though.

The fourth to understand is that SSAS can’t cache the results of some calculations, or if it can cache them for one user it cannot share the contents of this cache with other users. For example this will happen if the MDX calculation might return a different result every time it runs (eg if it used the Now() function to return the system date and time – which is obviously going to be different each time it’s called) or if it might return different results for different users (eg when two different users run the same query through different security roles, because they might be able to see different parts of the cube the values returned might be different). The way in which a query is constructed can also force ‘query’ scoping for the formula engine cache too: the presence of a calculated member in the WITH clause can do this, as can the use of subselects. This also means that FE cache warming is pointless for certain popular client tools like SSRS or Excel because the MDX they generate prevents the use of the FE cache.

Finally, the Formula Engine has two different structures it uses to hold cached values. For calculations that execute in bulk mode then it can use the same structure that the Storage Engine cache uses, namely the data cache registry. For calculations that execute in cell-by-cell mode, however, it uses a different structure called the flat cache. Calculations that execute in bulk mode in most cases cannot make use of values stored in the flat cache, and calculations that execute in cell-by-cell mode cannot make use of values stored in the data cache registry. Furthermore, the size of the flat cache is restricted to 10% of the TotalMemoryLimit server property; if it grows bigger than that it will be completely emptied.

(Incidentally, if you’re looking for more detail on any of the above points, I suggest you watch the video of my session on “Cache Warming Strategies for SSAS 2008” here).

So clearly there are many potential pitfalls to watch out for when warming the FE cache, and indeed in many cases I’d say that it’s just easier to concentrate on warming the SE cache and tuning your calculations so they execute as fast as possible even on a cold FE cache!

If you do need to warm the FE cache though, you need to be very careful. Unfortunately the CREATE CACHE statement mentioned in the previous post only works for the SE cache, so the only way to warm the FE cache is to use MDX queries. These queries should be hand-written specifically for the purpose – if you simply record queries run in production using Profiler, it’s likely you’ll end up with queries that don’t warm the FE cache because they contain a WITH clause or a subselect – but it’s nonetheless a good idea to use production queries as a starting point and then modify them so they become FE-friendly. A smaller number of larger queries is going to be better than a large number of queries that return small amounts of data, to avoid cache fragmentation in the data cache registry, and it’s a good idea to keep them as simple (and as non-arbitrary-shaped) as possible.

Since it’s likely that the calculations you’re most interested in caching are going to be the ones that, despite your best efforts at tuning them, execute in cell-by-cell mode, then it’s a good idea to keep an eye on the overall size of the various flat caches in existence. The following DMV, I think, shows the size of all the flat caches across all the databases and for every session on an SSAS instance:

select * from
where object_id=’FormulaCache’

…but I need to do a bit more research to make sure it shows what I think it shows and to make sure I can observe this 10% limit resulting in the flat cache being emptied.

Lastly, as with the SE cache, you don’t want to overfill it and leave no memory for the natural growth in cache that will occur as a result of normal querying; and you certainly don’t want to go over the LowMemoryLimit or TotalMemoryLimit while you’re warming the cache and end up with cache evictions taking place.

Speaking at the PASS Summit 2010

Hurray! I’ve been selected to speak at this year’s PASS Summit in Seattle:

Looks like another strong lineup of speakers again; I hope they do a better job of scheduling the sessions though…

…and don’t forget, if you’re in the UK (or indeed Europe) and your budget doesn’t stretch to a flight to Seattle, there’s always SQLBits! Just look at the sessions we’ve had submitted so far:

The Excel Web App and its missing API

A few weeks ago I was playing around with the Excel Web App and thinking that, while it’s better than the Google equivalent in some ways, the lack of an API seriously limited its usefulness. So I posted a question on the Excel Web App forum asking what the plans for an API were and got the following answer:

Currently there are no APIs exposed for Excel Web App, and we are not sure if this will be available in the future.

This was more than a little disappointing, to say the least… So I replied with reasons why an API would be a good idea and got Jamie to join in (he and I have very similar views on this type of subject) as well. You can read the whole thread here:

In summary, what I’d like to see is the Excel Web App be able to do the following:

  • Consume data from multiple data source types, such as OData, and display that data in a table
  • Expose the data in a range or a table as an OData feed

I think it would enable all kinds of interesting scenarios where you need data to be both human-readable and also, at the same time, machine-readable: for example, imagine being able to publish data in a spreadsheet online and then have your business partners consume that data in PowerPivot at the click of a button. The posts on the thread go into a lot more detail so I’d encourage you to read it; also Jon Udell picked up the issue and blogged about it here:

And now I need your help: the Excel Web App dev team asked for specific scenarios where an API would prove useful and both Jamie and I provided some, but I think the more we get (and the more people show that they want an API) the better. So if you’ve got some ideas on how you would use an API for the Excel Web App then please post them on the thread! The more noise we make, the more likely it is we can change the dev team’s mind.

%d bloggers like this: