Build Your Own Analysis Services Cache-Warmer in Integration Services
Cache-warming is one of the most neglected performance-tuning techniques for Analysis Services: perhaps it seems too much like cheating? Yet almost everyone knows how much difference there can be executing a query on a cold cache and a warm cache so there should be no excuse not to be doing it, especially if you know what queries your users are likely to be running in advance. AS2005’s caching mechanism is more complex than I can describe here (or than I can describe full stop – although I hear that the recently published "Microsoft Analysis Services 2005" has some great information on this front) but a lot of the time it can cache raw data of the cube and quite often the results of calculations too; you’ll need to test your own cubes and queries to find out exactly how much you’ll benefit but almost every cube benefits to a noticeable extent.
I’ve recently implemented a simple cache-warming system for a few customers which I thought I’d share details of. Now I know that the documentation for asmd contains details of how you can use it for this purpose (see http://msdn2.microsoft.com/en-us/library/ms365187.aspx for details) but I didn’t go down this route for a number of reasons:
- This example uses a batch file and I preferred to keep all my logic in SSIS, especially since the customers were already using it for cube processing.
- I wanted to avoid making my customers have to get their hands dirty extracting the MDX needed. They were using Excel 2003 as their main client and as you may know Excel 2003 makes heavy use of session sets so extracting and modifying the MDX it generates to create single MDX statements would have been too much to ask.
Here’s what I did instead. First, I created a new SQL Server database to store all the queries I was going to use. Then I used Profiler to capture the necessary MDX: I made sure no-one else was connected to the server, started a trace which only used the QueryBegin event and which included the TextData column, got the user to open Excel and construct and run their query, then stopped the trace and saved the results to a table in my new database. After doing this a few times I ended up with several tables, each of which contained the MDX generated for a particular sequence of queries in Excel.
Next I created a SSIS package which took each of these queries and executed them. Here’s what it looked like:
The outermost ForEach container used an SMO enumerator to loop through every table in my SQL Server database and put the table name in a variable (the expression generated by the UI was Database[@Name=’CacheWarmer’]/SMOEnumObj[@Name=’Tables’]/SMOEnumType[@Name=’Names’]). Next a script task used this table name to create a SQL SELECT statement which returned every query in the current table and put that in another variable. Here’s the code:
Dts.Variables("GetMDXQueries").Value = "SELECT textdata from [" + Dts.Variables("TableName").Value.ToString() + "] where DatabaseName=’" + Dts.Variables("ASDatabaseName").Value.ToString() + "’"
Next I used an Execute SQL task to execute this statement and out the resultset into another variable, the rows of which I looped over using the innermost ForEach loop using an ADO enumerator. Inside this second loop I got the MDX query out of the current row and into a string variable in a Script task as follows:
Dts.Variables("MDXQueryString").Value = Dts.Variables("MDXQueryObject").Value.ToString()
Then used another Execute SQL task, connected to my cube, to run the MDX query. I’ve been unable to execute MDX queries inside a Data Flow task (except when going via SQL Server using linked servers, which is nasty), hence the use of an Execute SQL task here; I also found that I had to use an ADO connection to my cube – if I used an OLE DB connection all my queries ran twice for some reason. I also set the RetainSameConnection property on the connection to the cube to true so that queries which relied on session scoped sets created earlier in the workflow didn’t fail; nonetheless I also set the FailPackageOnFailure and FailParentOnFailure properties of the Execute SQL task to false just in case. I was then able to save the package up to my server and use SQL Server Agent to execute it immediately after cube processing had finished.
As I said, if you implement a cache-warming system you’ll want to test how much of a difference it makes to your query performance. The easiest way to do this is to clear the cache and then run the package twice, noting how long it takes to run both times. The difference between the two times is the difference between a cold and a warm cache. To clear the cache you can either restart the Analysis Services service or run a Clear Cache command in an XMLA query window in SQLMS. Here’s an example of the latter which clears the cache of the Adventure Works database:
<Batch xmlns="http://schemas.microsoft.com/analysisservices/2003/engine">
<ClearCache>
<Object>
<DatabaseID>Adventure Works DW</DatabaseID>
</Object>
</ClearCache>
</Batch>
Now I will admit that I’m not the world’s greatest SSIS expert so if anyone has any suggestions for improving this I’d be pleased to hear them. Please test them first though – as I mentioned above I found SSIS didn’t always work as I expected with SSAS as a data source! I’ve also created a similar package whih connects to the query log AS creates for usage-based optimisation, reads the data in there and uses it to construct MDX queries which it then runs against the cube. This has the advantage of removing the need for anyone to extract any MDX from anywhere; plus the queries it constructs return very large amounts of data so you can use up all that memory you get on 64-bit boxes. The problem is that at the moment some of the queries it constructs are way too big and take forever to run… when I’ve worked out how I want to break them up into smaller chunks I’ll blog about it.
UPDATE: Allan Mitchell has very kindly done some more research on what happens when you try to run an MDX query through an Execute SQL task and written it up here:
http://wiki.sqlis.com/default.aspx/SQLISWiki/ExecuteSQLTaskExecutesMDXQueryMoreThanOnce.html
Microsoft BI Conference in May
Second Blog Birthday
Today is my blog’s second birthday; two years is pretty ancient in blog years I think. Thanks everyone for sticking with me this long!
Professionally, this year has seen a lot of changes for me: I’ve left the world of permie work and set up my own company, been made an MVP and seen my name on the front cover of a book for the first time. Working as a freelancer has certainly provided me with a lot of good material for this blog (watch out for some interesting stuff on cache warming and query parallelism in the next few weeks) even if it’s meant I’ve had much less time to write it up; the blog has, in turn, proved to be a good source of advertising for my consultancy work. I’m really enjoying myself at the moment – I hope 2007 is as good as 2006 has been.
Resolutions for next year:
- Post more on the Analysis Services MSDN Forum and microsoft.public.sqlserver.olap. I’ve been a bit slack over the last few months.
- Get myself some of these new BI certifications that have come out.
- Stretch myself more, technically. It’s too easy to stick around in my Analysis Services comfort zone. I need to improve my SSIS and C# skills for instance.
- Get into PerformancePoint.
- Lose weight. I was with a customer a few weeks ago and one of the guys there remarked that I looked older and fatter than I did in the picture on my blog (what Jon calls my ‘BI in the Jungle’ picture), although I looked thinner than the last time he’d seen me earlier in the year. Hmm, this is the result of working from home and not getting enough exercise.
- Keep having fun…
SP2 CTP3
Santa Dashboard
Santa Claus, Director of christmas joy and happiness
Deployment Wizard Bug
The Deployment Wizard is meant to allow you to deploy a project from BIDS but not overwrite certain parts of the associated Analysis Services database. Examples of things you’d not want to overwrite on the server include partitions, connection strings and security settings. However once I started actually using this functionality with my customers over the last few months I came across (or rather was informed of by one of said customers and was able to repro with others) a pretty major bug: when you select the ‘Retain Roles and Members’ option to not overwrite any of the security roles, while it does indeed not overwrite any of the roles it sets cube access in all of them to None and this in turn loses all the dimension security settings defined on the roles. So, be warned… it’s with PSS at the moment but I’m told it probably won’t be fixed in SP2.
UPDATE: I haven’t tested this myself, but according to Dan English this has been fixed in AS2008:
http://denglishbi.spaces.live.com/Blog/cns!CD3E77E793DF6178!490.entry
The Twelve Days of a BI Project
Here’s a Yuletide song for you and your colleagues to sing (to the tune of "The Twelve Days of Christmas", if you haven’t guessed already) before you disappear down the pub for an important offsite meeting this Christmas:
On the first day of work my consultants gave to me
A bill for their exorbitant fees.
On the second day of work my consultants gave to me
Two project plans
And a bill for their exorbitant fees.
On the third day of work my consultants gave to me
Three servers,
Two project plans
And a bill for their exorbitant fees.
On the fourth day of work my consultants gave to me
Four Gb of RAM,
Three servers,
Two project plans
And a bill for their exorbitant fees.
On the fifth day of work my consultants gave to me
SQL 2005,
Four Gb of RAM,
Three servers,
Two project plans
And a bill for their exorbitant fees.
On the sixth day of work my consultants gave to me
Six star schemas,
SQL 2005,
Four Gb of RAM,
Three servers,
Two project plans
And a bill for their exorbitant fees.
On the seventh day of work my consultants gave to me
Seven ETL jobs,
Six star schemas,
SQL 2005,
Four Gb of RAM,
Three servers,
Two project plans
And a bill for their exorbitant fees.
On the eighth day of work my consultants gave to me
Eight UDMs,
Seven ETL jobs,
Six star schemas,
SQL 2005,
Four Gb of RAM,
Three servers,
Two project plans
And a bill for their exorbitant fees.
On the ninth day of work my consultants gave to me
Nine slow reports,
Eight UDMs,
Seven ETL jobs,
Six star schemas,
SQL 2005,
Four Gb of RAM,
Three servers,
Two project plans
And a bill for their exorbitant fees.
On the tenth day of work my consultants gave to me
Ten lame excuses,
Nine slow reports,
Eight UDMs,
Seven ETL jobs,
Six star schemas,
SQL 2005,
Four Gb of RAM,
Three servers,
Two project plans
And a bill for their exorbitant fees.
On the eleventh day of work my consultants gave to me
Eleven hotfixes,
Ten lame excuses,
Nine slow reports,
Eight UDMs,
Seven ETL jobs,
Six star schemas,
SQL 2005,
Four Gb of RAM,
Three servers,
Two project plans
And a bill for their exorbitant fees.
On the twelfth day of work my consultants gave to me
Twelve angry users,
Eleven hotfixes,
Ten lame excuses,
Nine slow reports,
Eight UDMs,
Seven ETL jobs,
Six star schemas,
SQL 2005,
Four Gb of RAM,
Three servers,
Two project plans
And a bill for their exorbitant fees.
PerformancePoint CTP
I was away last week, so I missed the news that the first CTP of PerformancePoint was available for download from Connect. Since the world and his dog have subsequently blogged about it I haven’t bothered until now; I have to admit that I’ve not got round to downloading or installing it yet either (though it will be providing me with justification for buying a new server to install it on in the sales). Other bloggers like Patrick Husting, Charlie Maitland and Ian Tien are probably better bets for the latest news in this area. I’m a little bit suspicious of the amount of hype surrounding it too… hmm, well, we’ll see what it’s like next summer.