Building a Better Cache-Warmer, Part 1: the Storage Engine Cache

SSAS caching is an interesting subject: I’ve blogged numerous times about when caching can and can’t take place, and one of the most popular posts I’ve ever written was on how to build a cache-warmer in SSIS. However that last post was written a while ago, before the CREATE CACHE statement for warming the Storage Engine cache was reintroduced in SSAS 2005 SP2, and I’ve not been satisfied with it for a number of other reasons too. So as a result I’ve started to do more research into the issue and, with the help of Mosha, Akshai Mirchandani, and the chapter 29 of the indispensible “Microsoft SQL Server 2008 Analysis Services Unleashed” I’ve been collecting my thoughts on how to build a better cache warmer with a view to blogging about it again. This first post will deal with the relatively easy bit – warming the SE cache – and when I have time I’ll write about the more difficult but more useful task of warming the Formula Engine cache, and by hopefully then I’ll be in a position to create the definitive SSIS cache warmer package and be able to blog about that too.

Before I go on I need to mention that one other reason that has made me want to revisit this subject is that, at his MDX Deep Dive session at PASS last year, Mosha asserted that cache-warming was a Bad Thing. This intrigued me, since it seemed like such an obviously Good Thing to do, but of course it turns out he had good reason for saying what he did… SSAS caching is a lot more complex than it first appears and cache warming done badly can be counter-productive.

I’m also tempted to say that if you don’t need to warm the FE cache (ie you don’t have any complex calculations that take a long time to run) then there’s not much point in warming the SE cache. In my experience if you have the right aggregations built then you can get excellent response times even on a cold cache. Of course to build aggregations you need to know the granularities of data that your users’ queries are, but you need the same information to warm the cache. The only difference would be that you can make your cache warmer much more dynamic than your aggregation design: if your users’ query patterns change regularly it would be difficult to keep your aggregation design in synch with them, whereas with a cache-warmer you can record the most recent queries that have been run and warm the cache accordingly.

There are a number of things to bear in mind when warming the SE cache:

  • We must not try to cache too much data. If we go over the LowMemoryLimit data will start to be evicted from the cache, which means we get no benefit from that point on; if we go over the TotalMemoryLimit it’s likely that just about the entire cache will be emptied. We also have to remember that the cache will grow as a result of normal usage during the day, so we should plan to use only a fraction of the available memory when warming the cache.
  • We want to avoid cache fragmentation. The SE cache uses a data structure called the data cache registry to store its data, and when the FE requests a (potentially filtered) subcube of data from the SE that data subsequently gets stored in the SE cache as an entry in the data cache registry. As a result it is possible that instead of having one entry in the SE cache that contains a lot of data, the same data could be spread across multiple smaller entries. This is bad because:
    • It means that lookup in the cache is slower as there are more entries to search
    • The SE doesn’t bother to continue searching in the data cache registry after it has compared the current request with more than 1000 subcubes, and will then go to disk even if the required data is in the cache. If the cache is filled with lots of small entries then this will happen more often.
    • If the FE requests the slice {A,B} from a hierarchy in a subcube and the data for those two members A and B are held in different entries in the cache, then it will not be found. It will only be matched if a single entry in the cache contains data for both A and B.
  • Data in the SE cache can be aggregated to answer a request – but only if the data itself if aggregatable, and this is not the case when there are distinct count measures in the measure group or when there are many-to-many relationships present. Assuming that the cache is aggregatable though we should perform SE cache warming at lower rather than higher granularities, although if we go too low down we’ll end up caching data we don’t need, using up too much memory and find that the cache warming itself takes too long.

Of course all of the above is subject to change in later versions, but I understand that this is the current situation with SSAS 2008 SP1.

How can we find out what data to use to warm the SE cache then? The MDX queries that have been run by users aren’t much help, but we can see what subcubes are being requested when a query is run by looking at the Query Subcube and Query Subcube Verbose events in Profiler; we could therefore run a Profiler trace to capture this data, but in fact the easiest option is to just use the Usage Based Optimization wizard’s Query Log since it contains the same data as Query Subcube. The following blog entry from James Snape shows how to interpret this information: 
A truly sophisticated SE cache warmer would be able to take this data, along with data about attribute relationships, and work out what the most recent, frequent (but slowest) subcube requests were, then try to find some lowest (but not too low) common granularities to use for cache warming, perhaps also slicing so that only the most recent time periods were used as well.

Last of all, how do we actually go about loading data into the SE cache? We could run MDX queries but this is a bit of a risky approach since we may not know what calculations are present and so we can’t be sure that our query requests the right data; the best option is to use the CREATE CACHE statement that is described here:

Slow metadata

There’s an interesting thread on the Analysis Services MSDN Forum that I’ve been following for a while now (it’s over 40 posts long at the time of writing) that gives a good explanation of why metadata access on very large or complex cubes can be so painfully slow. Here’s the link:

In summary, it turns out that the more files there are on disk in your SSAS database folder, the slower metadata reads become, and when you reach very large numbers of files almost any kind of metadata access becomes incredibly slow. There are a lot of different factors determining how many files are created for your SSAS database but large numbers of cubes, attributes, partitions and aggregations (which end up being multiplied across each partition) are the main culprits; the problem is that the way the EstimatedSize property on metadata objects has been implemented results in SSAS having to iterate over all the files in every subdirectory in the database’s directory in the data folder to find their size.

One workaround is to set the Storage Location property of your partitions to somewhere outside the data directory since this means they won’t get scanned, but doing this makes backup and restore difficult and means you get inaccurate size estimates. The only other option is to reduce the number of partitions and aggregations you have; given that the SQLCat team are now saying that partitions of 50-60 million rows each are feasible in 2008, and that you can use different partition designs for partitions containing old and new data, there’s probably a lot of scope for doing this in most cubes. Hopefully, though, this problem will be fixed in a future release…

adCenter and SSAS 2008 White Paper

There’s a new white paper on how adCenter uses Analysis Services 2008 available here:

For the most part it’s a fairly run-of-the-mill list of reasons to upgrade to 2008 (and, to be honest, if you’re running 2005 I can’t think of a good reason not to upgrade to 2008 – the performance is so much better) but there are several good nuggets on performance tuning for large SSAS implementations; for example, table 3 has a list of ini file settings for optimising processing. Worth a look.

Top Ten Reasons to come to SQLBits

It seems like every conference nowadays has to come up with a ‘top ten’ of reasons to attend, doesn’t it? Well, we at SQLBits thought we shouldn’t be left out and I got the job of writing it. So, even though it’s already been used in one of our promotional emails and Simon’s already put them on his blog, I thought I’d post them up here:

10. We’re going West (Newport, to be exact)! Which is great news if you live in Wales or the South West and you get fed up with all the tech conferences being held in London or the Thames Valley. Even if you don’t live in the local area it’s only 1hr 45 by car or train from London, it’s close to Birmingham, and with Bristol and Cardiff airports nearby it’s easy to get to from anywhere in Europe.

9. SQLBits has grown to three days. We have a day of pre-conference seminars on Thursday November 19th; on Friday November 20th we have a special SQL 2008 and SQL 2008 R2 day; and on Saturday November 21st we have our flagship free community conference. That’s probably all the SQL Server education you’ll want or need this year.

8. Meet other SQL Server professionals. Are you the only DBA in the village? Do people think you’re talking about your private life when you say you’re a “BI consultant”?  Do you have to pretend to be an estate agent or a lawyer at parties to avoid the stigma of being seen as an IT geek? At SQLBits you’re among friends: no-one will mind you talking about clustered indexes or the finer points of MDX syntax.

7. Attend an advanced technical training course. Got some budget for a training course, but don’t want to spend it on yet another basic Microsoft Official Curriculum course where the instructor only read the course notes the night before? Our pre-conference seminars are run by world-class SQL Server experts and will give you the knowledge you need to do your job. Our big draw is Donald Farmer doing a whole day on Gemini and self-service BI, but I’ll be running an introductory MDX course too…

6. Win some swag, drink some beer and eat some pizza. It’s not all work, work, work at SQLBits: we have great giveaways, there are loads of opportunities for socialising and there’s beer, pizza and games at the end of the day. And if you like golf, you might be interested to know that the venue, Celtic Manor Resort, has three championship courses and is playing host to the 2010 Ryder Cup.

5. Check out the latest SQL Server-related products. Don’t have time to download and test the latest tools and utilities? Our sponsors will be happy to demo them for you and provide eval versions. They’ll probably also have loads of those cheap plastic promotional freebies that your kids love.

4. Find out what’s coming in the next version. Our new SQLBits Friday event will give you the lowdown in what’s new in SQL 2008 and what’s coming in SQL 2008 R2 (including Madison, Gemini, StreamInsight, Master Data Services and changes in Reporting Services), so you can plan for the future more effectively.

3. Bring the family. Want to come to SQLBits, but feel guilty about leaving the family at home? No need, they can come too! The venue has its own swimming pools, a spa, tennis courts and restaurants as well as the golf courses. Repeat after me: “It’s not a conference, darling, it’s a luxury weekend break…”

2. Deep technical content. We won’t make you sit through hours of marketing guff from some sales guy who doesn’t know a database from a spreadsheet – SQLBits is all about SQL Server professionals sharing their hard-won technical knowledge with the community. We’ve got big names like Donald Farmer (program manager on the Analysis Services/Gemini dev team) and Thomas Kejser (from the SQLCat team), and MVPs like Simon Sabin, Allan Mitchell and me speaking.

1. Did we mention that SQLBits Saturday is FREE? Yes, as always, the SQLBits community day on Saturday November 21st is free to attend. Top-class speakers, a massive selection of sessions and 100% SQL Server focused – and you pay nothing. That’s 100% cheaper than most tech conferences!

So, what are you waiting for? Register now at:

Oh, and if you are already registered you might be interested to know that voting has just opened, so you can pick the sessions you’d most like to see.

Last of all, would you like to win free admission to the training day of your choice, the Friday event, and free hotel acommodation at Celtic Manor for the Thursday and Friday night? Then check out our mugshot competition:

Now() and the Formula Cache

You know, I have the nagging feeling I’ve already blogged about this before… but I can’t find the post (I have been at this for over four years now) so it’s worth mentioning again…

Anyway, a common question asked on the MSDN Forums is how to get the current date and then use it in an MDX calculation – see this thread for example. Usually this is because users want calculations that show the current day’s sales, or something similar. One answer is of course to use the Now() function, but what people don’t often realise is how this can impact the ability of Analysis Services to cache the values returned by calculated members, and therefore reduce overall query performance.

To understand why, let’s look at some examples in Adventure Works. First of all create a calculated measure on the AW cube as follows:


Then run the following query a few times:

SELECT {[Measures].[NOWDEMO]} ON 0
FROM [Adventure Works]

As you’d expect, every time you run this query you see the current date and time – and every time you run it, you see a different value. But, you may be thinking, doesn’t Analysis Services cache the results returned by calculated members? Well, yes it does in most cases, but for non-deterministic functions (functions that could return a different result every time they’re called) like Now() no caching takes place, because otherwise the value returned from the cache might be different from the one the function actually returns.

The next problem is that if you create any other calculated members that depend directly or indirectly on the value returned by a non-deterministic function, their values can’t be cached either. One problem I see sometimes in my consultancy work is poor query performance resulting from SSAS being unable to use the formula cache, because a large number of calculations have a dependency on a single calculation that uses the Now() function. Here’s a greatly simplified example of two calculated members, the first of which finds the current year and the second which returns a sales value for the year eight years before the current year:

AS YEAR(vba!format(now(),"MM/dd/yyyy")),

AS ([Measures].[Internet Sales Amount],
STRTOMEMBER("[Date].[Calendar Year].&[" + CSTR([MEASURES].[Current Year] – 8) + "]", CONSTRAINED));

If you run a query that references the second calculated measure on a cold cache, such as:

FROM [Adventure Works]

The first time you run it you’ll see SSAS going to disk as you’d expect; the second time you run it though you’ll see SSAS is able to use the Storage Engine cache but not the Formula Engine cache, as this Profiler trace shows:


The highlighted Get Data From Cache event shows data being retrieved from the measure group cache. This is ok and can be beneficial for query performance, but if the calculation we’re doing is very expensive then it can still mean our query takes a long time to run. We’ll only get an instant response on a warm cache if we can work out how to use the formula cache somehow.

Luckily, in most cases where Now() is used, we don’t usually want the system date and time, we just want the date. That means that we only want to return a different value when the date changes, once every 24 hours. What we can do therefore is use a named set to somehow store the value returned by Now(), for example like this rewrite of the calculation above:

AS YEAR(vba!format(now(),"MM/dd/yyyy")),

{STRTOMEMBER("[Date].[Calendar Year].&[" + CSTR([MEASURES].[Current Year] – 8) + "]", CONSTRAINED)};

AS ([Measures].[Internet Sales Amount],

What we’re doing here is finding the year we want, then finding the member for the year 8 years ago and storing that in a named set. Because a static named set is only evaluated once, when we do our final calculation we can reference the single member stored in the named set and therefore make use of the formula cache as the following Profiler trace shows:


We’re now getting data from the flat cache, which is one part of the formula cache (which isn’t ideal either as it indicates the calculation is being evaluated in cell-by-cell mode, I guess because we’re referencing a named set inside it) and so warm-cache performance will be better.

The next problem is that when the date does change, we need to clear the cache. This can be accomplished easily by running an XMLA ClearCache command, perhaps direct from SQL Server agent, every night at midnight or whenever necessary.

To be honest, though, I’m not sure using the Now() function at all is a good thing – apart from the issues described here there are a lot of other risks involved, such as the time or date on your server being wrong or confusions with time zones and date formats. I think a better approach to the problem is to have an extra attribute in your Time dimension which flags up a date as ‘today’, and which changes every day. Of course this means you need to do some extra ETL and processing on your Time dimension as a result, but I think it’s a much cleaner solution than Now() and leads to much more efficient MDX calculations.

Implementing IE8 Web Slices in an SSRS Report

One of the new features that caught my eye in Internet Explorer 8 when it came out was Web Slices – the ability for a web developer to carve up a page into snippets that a user can then subscribe to. There’s a brief overview of what they are here:
and a really good guide to implementing them from Nick Belhomme here:

Being the BI geek that I am, my first thought was to see whether they could be used with Reporting Services reports. After all, wouldn’t it be cool if you could subscribe to a table in an SSRS report, or even better a cell within a table, and get notified when that value changed rather than have to keep pinging the report yourself? Of course it would! Here’s how to do it…

The challenge with implementing web slices is to get SSRS to generate the necessary html when it renders your report. I first looked at using the new rich formatting functionality that’s available in SSRS 2008 that Teo Lachev describes here, but it turns out that you can’t use this to create Web Slices because SSRS doesn’t support the necessary attributes (see here for details – at least I assume this is why, because I couldn’t get it to work). The only way I could get it to work was to render the report as XML and then use an XSLT file to give me complete control over the HTML that SSRS generates. I won’t go into too much detail about how this works; once again, Teo has an excellent explanation in his book “Applied Microsoft SQL Server 2008 Reporting Services” (reviewed here – it’s an excellent book) on pages 263-265. To be honest this isn’t a satisfying approach for me because it involves a lot more effort to get the report looking the way you want, and of course you have to have control over how the report is rendered. However, it still makes for a fun proof-of-concept 🙂

The first thing I did was create a simple SSRS report in BIDS that brought back values for Internet Sales broken down by country:


I then rendered the report to XML, took a look at the XML generated, and created a simple XSLT file that would generate a HTML report from that XML. I then added the XSLT file to my project and associated my report with it using the report object’s DataTransform property, so that it was always used when the report was rendered to XML. I was then able to deploy the project and, by using URL access to the report get it to render to XML and get the result treated as html, was able to see the following in IE8:


Here’s an example SSRS URL that does this:

Then I went back to BIDS and altered the XSLT file to add the necessary tags for a Web Slice around the main table. When I went back to IE and reopened the report after deployment I could see two new things. First, the Web Slice button appeared in the IE toolbar:


And when I moved the mouse over the table in the report, it was highlighted with a green box as a Web Slice:


I could then click on either to subscribe to the Web Slice and have it added to my favourites. This then meant I could see the contents of the table in my Favourites bar whenever I wanted:


And whenever the data changes (you can control how often IE polls the original web page in the Web Slice’s properties, and also in the definition of the Web Slice itself) the text in the Favourites bar turns bold:


So there you are. Even with the limitations that having to render to XML imposes I can think of a few useful applications of this approach… maybe I’ll test them out in a future blog entry. Let me know if you have any ideas!

One last thing: I think it would great (and involve relatively little dev work) if SSRS supported the creation of Web Slices out of the box. If you agree, please vote:

You can download my proof-of-concept SSRS project here:

DataWarehouse Explorer

Continuing my occasional series of SSAS client tool reviews, here’s another contender in the post-Proclarity power-user market: DataWarehouse Explorer, from Dutch company CNS International.

DWE is a standalone, ‘rich client’ application that gives you a lot more functionality than you get in Excel pivot tables and as such is competing in the same market that Proclarity Desktop Professional used to dominate and which is still pretty crowded. There’s also a web-based portal that you can publish reports to (see here for full details on the architecture) but if you want to build queries you need to do it on your desktop.

So what’s it like? I liked it: it’s not got any flashy features that mark it out particularly, but it does everything it needs to do and it does so well. Probably the best thing is the UI – a nice Office 2007 look-and-feel and most importantly very clear and easy to use. As someone who has spent plenty of time working with Analysis Services over the last ten years or so, when I start using a new client tool I expect to be able to do what I want to do very easily: I know all the basic concepts of cubes, I know the Adventure Works cube, and I know the queries I want to run, so if I can’t work out how to do something then I lay the blame on the UI design. And if I can’t do something there’s not point expecting an end user to do it. In the case of DWE I had no problems at all and in many respects it’s much easier to use than something like Proclarity or Excel. Here’s a screenshot:


The filter dialog provides a good example of how they’ve got the UI right. Filtering is something that every worthwhile client tool needs to do, but it’s easy to make it confusing for the user especially when you’re applying multiple conditions. The DWE filter dialog is uncluttered, shows all the filters you’ve already set up, makes it easy to add new ones or delete existing ones, and has a number of nice touches like the way it automatically formats any numeric conditions you enter to match the format string of the measure you’re filtering on.

DWE Filter

Other features worth mentioning include:

  • It mimics Excel 2007’s in-cell data bars and conditional formatting very closely. I like those features in Excel and things like this make DWE very easy to pick up for Excel users.
  • There’s a ‘Notes’ pane where you can add text commenting on the query you’ve built.
  • In the slicer pane, you can search for hierarchies by name – useful when you’ve got a lot of hierarchies and dimensions:
  • Similarly, the slicer pane can organise the hierarchies on slice according to which ones you’ve explicitly selected something on, ones where there is an implicit selection (for example because there’s no All Member or a specific Default Member has been set), and ones where there is no selection:
  • There’s a ‘Cube Dictionary’ feature that allows you to look at the metadata of objects on the server, for example to check the aggregation method that a measure uses:
  • The UI can be switched between English, Dutch, Portuguese and Spanish.
  • You can hide more difficult functionality by setting the ‘User Level’ option to ‘Basic’ or ‘Intermediate’ rather than the default of ‘Advanced’. Fewer buttons and options improves ease-of-use for new or less competent users.

Overall, then, a good product and one worth evaluating if you’re looking for a desktop-based SSAS client tool.

%d bloggers like this: