Naming Objects in Analysis Services

How you should go about naming objects in Analysis Services is another hobby horse of mine – I had a small rant about it in “Expert Cube Development with SQL Server Analysis Services 2008” and I thought I’d mention it here because it’s sure to provoke some debate.

My guiding principle when naming objects (cubes, dimensions, attributes, measure groups or measures) in Analysis Services is to think like you’re designing a user interface and not a database. That’s because a cube really is a kind of user interface: it’s a friendly, easy-to-query window onto all of the data in your data warehouse for non-technical users. That means that any object that the user is likely to see in the metadata, and which will appear in any reports, pivot tables or other queries the user generates, should have a name they understand and want to see. That also means that any techy naming conventions you follow in your relational database design should be completely ignored because, while they might make sense to you as an IT professional, they are likely to be gobbledegook to your users.

The commonest example of bad practice that I see is having Analysis Services dimensions called “Dim Something” – “Dim Product”, “Dim Time”, “Dim Customer” and so on. Hands up who has a cube with dimensions named like this? OK, feel ashamed. Ask yourself, do your users want to go to the CEO with an Excel spreadsheet containing column headers like this? No, of course not, they want to see “Product”, “Time” and “Customer”. They know these things are dimensions already and the “Dim” bit is totally redundant. Similarly, they don’t want to see measures called “Sls_Pct_Chg” or attributes called “CustID” or any of that; and even if you come up with what seems to be a reasonable, human-readable name yourself but it’s still not what the users want they’ll do their best to change it. By not giving them the names they want you’re generating extra work and hassle for them, putting them off using the cube, and making it more likely that different users will come up with different names for the same thing in reports.

Of course this means you have to go and talk to your users about what they want to have their objects called. Since changing an object’s name can end up breaking calculations and any reports that your users have already defined, then you need to do this right at the beginning of your project, even before you’ve run any wizards for the first time. You still need to make sure the names make sense, are consistent, and are acceptable to the whole user community, but ultimately it’s them making the decisions and not you. And if it’s too late to change things now on your current project, remember this post the next time you set about building a cube!

BI.Quality

Here’s something interesting that I’ve just seen on Hilmar Buchta’s blog: a new, open source testing suite for the Microsoft BI stack (including SSAS) called BI.Quality. It’s available for download from Codeplex here:
http://biquality.codeplex.com/

Unfortunately, at the time of writing, the documentation’s only in German although an English user manual is promised. From what I can make out, though, it looks like it could be really useful.

Cell Security and Calculated Members Defined in the WITH Clause or Session

I was asked an interesting question today about cell security – how can you get it to work with calculated members defined in the WITH clause or the session? If, for example, you create a role on the Adventure Works database with the following expression in the Read Permissions box for cell security:

[Geography].[Geography].currentmember is [Geography].[Geography].[Country].&[Australia]
AND
( [Measures].currentmember IS [Measures].[Reseller Sales Amount]
OR
[Measures].currentmember IS [Measures].[Reseller Order Count])

If you then run the following query when connecting via the role:

WITH
MEMBER MEASURES.TEST AS
[Measures].[Reseller Sales Amount] * [Measures].[Reseller Order Count]
SELECT
{MEASURES.TEST, [Measures].[Reseller Sales Amount], [Measures].[Reseller Order Count], [Measures].[Reseller Gross Profit]}
ON 0,
[Geography].[Geography].MEMBERS
ON 1
FROM [Adventure Works]

You’ll see the the following results:

image

This is pretty much what you’d expect – you only see values for Australia for Reseller Sales Amount and Reseller Order Count. The problem here is how you can grant access so the user can see the measure TEST which has been defined in the WITH clause. If you try the following expression in the role:

[Geography].[Geography].currentmember is [Geography].[Geography].[Country].&[Australia]
AND
( [Measures].currentmember IS [Measures].[Reseller Sales Amount] OR [Measures].currentmember IS [Measures].[Reseller Order Count] OR [Measures].currentmember IS [Measures].TEST)

You see the following error in a few cells in the query above:
#Error CellPermission (3, 162) The member ‘[TEST]’ was not found in the cube when the string, [Measures].[TEST], was parsed.

And in any case, how can you know in advance what the name of the calculated measures you want to grant access to are going to be anyway?

What we actually want to do is retain control of all measures, calculated or otherwise, that are defined on the cube, but allow access to any calculated measures defined in the WITH clause or the session. The way to do it is to rely on the fact that cell security is evaluated after the MDX Script and do the following. First define a named set on the cube at the end of the MDX Script something like this:

CREATE SET CURRENTCUBE.ALLMEASURES AS MEASURES.ALLMEMBERS;

This set contains all of the measures, including calculated measures, that have been created on the MDX Script – except those that have their Visible property set to False (these you’d have to add to the set manually).

Then use an expression such as this in your role:

[Geography].[Geography].currentmember is [Geography].[Geography].[Country].&[Australia]
AND
( [Measures].currentmember IS [Measures].[Reseller Sales Amount] OR [Measures].currentmember IS [Measures].[Reseller Order Count] OR
Count(Intersect({[Measures].currentmember},{ALLMEASURES}))=0)

The last line is the new part: it checks to see whether the currentmember on Measures is included in the set we’ve just defined. If it is, we can be sure it’s defined on the cube. If it isn’t, it must be defined either in the WITH clause or in the session so we can grant access to it. If you then run the query again, you get the following results:

image

As you can see, we can now see the value for TEST. Of course you also need to be extremely careful you’re not using a security hole here by doing this, test thoroughly and use Read Contingent permissions as appropriate.

Ragged Hierarchies, HideMemberIf and MDX Compatibility

Here’s something that I suspect a few people out there will be surprised I didn’t know – but as far as I’m concerned, if I didn’t know it then it’s worth blogging about.

Anyway, it regards ragged hierarchies, the HideMemberIf property and the MDX Compatibility connection string property. Now you probably know that if you want to turn a user hierarchy into a ragged hierarchy (perhaps to avoid using a parent child hierarchy) you need to use the HideMemberIf property on the user hierarchy. For example, imagine you were using the following SQL query as the basis of your customer dimension:

SELECT        ‘UK’ AS Country, ‘Bucks’ AS State, ‘Amersham’ AS City, ‘Chris Webb’ AS Customer
UNION ALL
SELECT        ‘Italy’ AS Country, ‘BI’ AS State, ‘Biella’ AS City, ‘Alberto Ferrari’ AS customer
UNION ALL
SELECT        ‘Vatican’ AS Country, ‘Vatican’ AS State, ‘Vatican’ AS City, ‘The Pope’ AS customer

We could build a dimension off this with attributes for Country, State, City and Customer, and for two out of our three customers that would be fine. However the Pope lives in the Vatican, which is (at least for the purposes of this exercise) a Country with no concept of City or State; and in the case of customers who live in the Vatican, we just want to be able to drill down on the Country ‘Vatican’ and see all of the Customers who live there without drilling down through a meaningless State and a City.

So what we can do is build a user hierarchy on our dimension with levels Country, State, City and Customer, and on the lower three levels set the HideMemberIf property to OnlyChildWithParentName:

image

Then, with any sensible client tool, we can connect to the cube and browse the dimension as we want:

image

I saw ‘sensible’ client tool, because of course this only works if you set:
MDX Compatibility=2
…in the connection string. And of course Excel 2007 hard-codes MDX Compatibility=1 in the connection string and doesn’t allow you to change it, so you can’t use ragged hierarchies properly.

This much I knew.

However, what I didn’t realise until last week when I was moaning about this to TK Anand from the SSAS dev team at PASS, is that for some ragged hierarchies you don’t need to set the MDX Compatibility connection string property at all.

For example, if in our case we duplicate the Customer upwards rather than the Country downwards, like so:

SELECT        ‘UK’ AS Country, ‘Bucks’ AS State, ‘Amersham’ AS City, ‘Chris Webb’ AS Customer
UNION ALL
SELECT        ‘Italy’ AS Country, ‘BI’ AS State, ‘Biella’ AS City, ‘Alberto Ferrari’ AS customer
UNION ALL
SELECT        ‘Vatican’ AS Country, ‘The Pope’ AS State, ‘The Pope’ AS City, ‘The Pope’ AS customer

…and then build the dimension, setting HideMemberIf on our user hierarchy to OnlyChildWithParentName, we can get the result we want without setting the MDX Compatibility property. Here’s a screenshot of this new dimension in Excel just to prove it:

image

The difference here is that we’re hiding all members below the State level right down to the bottom of the hierarchy, rather than hiding members somewhere in the middle of the hierarchy. Truly, this is one of those “Doh, if only I’d known!” moments… this at least means that in some of the scenarios where you’d use ragged hierarchies you can get them to work with Excel, even if it means that we have to hack the data (‘The Pope’ is a Customer, not a State or a City). On the other hand there are plenty of scenarios where you do need to hide members in the middle of a hierarchy, and frankly I don’t see why Excel 2007 can’t set MDX Compatibility=2 in its connection string so they work properly.

Live Blogging @PASS – SSAS Consolidation and Virtualisation

Here are some notes from the SQLCat team’s session on SSAS consolidation and virtualisation; they’re a bit fragmentary since I’m too busy paying attention to what’s being said! I get asked about these issues by my customers all the time.

  • Use Windows System Resource Manager to control how many resources SSAS can use. For more on WSRM see http://technet.microsoft.com/en-us/library/cc755056.aspx
  • If SSAS and SQL are on the same server, use the Shared Memory protocol to improve processing performance
  • Also since resource usage requirements for SSAS and SQL will be different when processing and when querying, if they’re on the same box you can use WSRM to dynamically change resource allocations at different times.
  • Consolidating multiple SSAS databases on the same machine, it can be good to use multiple instances (maybe one per database) on the same machine to give fine control over resource usage, service packs etc.
  • Someone asked the question of whether there is an overhead to using multiple instances each with one database rather than using a single instance with multiple databases. Answer: multiple instances would perform better but use slightly more resources; better to start with a single instance and only move to multiple instances when you have a good reason to do so.
  • Tests run comparing SSAS running on bare metal and Hyper V – no difference in performance between the two for querying, but for the Storage Engine (processing and SE activity when querying) you use 1.5 times more threads on Hyper V (can modify the default number of threads available for processing, may therefore need to change this for Hyper V).
  • Description of a custom-built system for load balancing SSAS developed by the MSSales team inside Microsoft. Code and white paper will be available in a few months.
  • There were various issues with Synchronization in SSAS that have been fixed in the late CUs.
  • IIS7 performs much better for HTTP access to SSAS – performs as well as a direct connection. I’m sure I also heard somewhere that there were some performance issues for HTTP access that were noticeable over a slow network that have also been fixed in the latest CUs.

The most useful session so far at this conference for me – I learned a lot.

Quest add support for SSAS monitoring

Something I saw yesterday at PASS: Quest now have support for monitoring SSAS from their “Spotlight on SQL Server Enterprise” product. See

http://www.quest.com/newsroom/news-releases-show.aspx?contentid=10602
http://www.quest.com/spotlight-on-SQL-Server-enterprise/features-benefits.aspx

It’s pretty basic at the moment – they capture some Perfmon counters and data from schema rowsets, but no trace data – and nowhere near as sophisticated as what SQLSentry have, but it’s good to see another vendor entering this market.

Actions and Multiselect

At the beginning of this week a customer asked me why, in a certain third-party client tool that shall remain nameless, they could no longer do a drillthrough when they did a multiselect on a filter axis. It seemed a bit weird to me, and it got weirder when I asked around for ideas and Greg Galloway pointed out that Excel 2007 didn’t show any actions at all when there was a multiselect, and Marco Russo noted that the current beta of Excel 2010 didn’t either. This made me wonder whether the problem was in fact with Analysis Services rather than the client tools…

I didn’t actually know how a client tool worked out what actions were available when, so I did some research and found out that the MDSCHEMA_ACTIONS schema rowset was how it was done. Here’s the documentation on MSDN:
http://msdn.microsoft.com/en-us/library/ms126032.aspx

For example, if a client tool needs to know which actions can be called when a user clicks on a cell in a resultset, then it will execute an XMLA command something like this one on Adventure Works:

   1: <Discover xmlns="urn:schemas-microsoft-com:xml-analysis">
   2:     <RequestType>MDSCHEMA_ACTIONS</RequestType>
   3:     <Restrictions>
   4:       <RestrictionList xmlns="urn:schemas-microsoft-com:xml-analysis" xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
   5:         <CUBE_NAME>Adventure Works</CUBE_NAME>
   6:         <ACTION_TYPE>401</ACTION_TYPE>
   7:         <COORDINATE>([Customer].[Country].&amp;[Australia],[Measures].[Internet Order Quantity])</COORDINATE>
   8:         <COORDINATE_TYPE>6</COORDINATE_TYPE>
   9:       </RestrictionList>
  10:     </Restrictions>
  11:   <Properties>
  12:   </Properties>
  13: </Discover>

You can see that a tuple is being passed into the COORDINATE to indicate which cell we’re interested in. But when there’s a multiselect, which cell in the cube are we actually clicking on? Good question… Different client tools handle multiselect in different ways, and it turns out there’s no way of telling SSAS you’re doing a multiselect in this situation. If you try to pass a set of tuples to the COORDINATE you get no actions returned, for instance.

Having talked this over with Akshai Mirchandani from the dev team, what the client tool needs to do is to make multiple calls to MDSCHEMA_ACTIONS, one for each member selected in the multiselect. It then needs to work out from each of the rowsets returned which actions should be available in the current context – and of course, in this case, there’s a good chance that different client tools will do different things (if they do anything at all). Not ideal.

To be honest, this really needs to be something that is solved in SSAS rather than on the client and the key to solving it properly would be to have a standard way of handling and detecting multiselect in MDX. As Mosha hinted here, it’s something that’s been on the dev team’s radar for a while but it’s still not made it into the product unfortunately. In the meantime, if there are any client tool developers from the Excel team or third parties out there reading this, it would be great if you could at least do something rather than nothing here!

Building a Better Cache-Warmer, Part 1: the Storage Engine Cache

SSAS caching is an interesting subject: I’ve blogged numerous times about when caching can and can’t take place, and one of the most popular posts I’ve ever written was on how to build a cache-warmer in SSIS. However that last post was written a while ago, before the CREATE CACHE statement for warming the Storage Engine cache was reintroduced in SSAS 2005 SP2, and I’ve not been satisfied with it for a number of other reasons too. So as a result I’ve started to do more research into the issue and, with the help of Mosha, Akshai Mirchandani, and the chapter 29 of the indispensible “Microsoft SQL Server 2008 Analysis Services Unleashed” I’ve been collecting my thoughts on how to build a better cache warmer with a view to blogging about it again. This first post will deal with the relatively easy bit – warming the SE cache – and when I have time I’ll write about the more difficult but more useful task of warming the Formula Engine cache, and by hopefully then I’ll be in a position to create the definitive SSIS cache warmer package and be able to blog about that too.

Before I go on I need to mention that one other reason that has made me want to revisit this subject is that, at his MDX Deep Dive session at PASS last year, Mosha asserted that cache-warming was a Bad Thing. This intrigued me, since it seemed like such an obviously Good Thing to do, but of course it turns out he had good reason for saying what he did… SSAS caching is a lot more complex than it first appears and cache warming done badly can be counter-productive.

I’m also tempted to say that if you don’t need to warm the FE cache (ie you don’t have any complex calculations that take a long time to run) then there’s not much point in warming the SE cache. In my experience if you have the right aggregations built then you can get excellent response times even on a cold cache. Of course to build aggregations you need to know the granularities of data that your users’ queries are, but you need the same information to warm the cache. The only difference would be that you can make your cache warmer much more dynamic than your aggregation design: if your users’ query patterns change regularly it would be difficult to keep your aggregation design in synch with them, whereas with a cache-warmer you can record the most recent queries that have been run and warm the cache accordingly.

There are a number of things to bear in mind when warming the SE cache:

  • We must not try to cache too much data. If we go over the LowMemoryLimit data will start to be evicted from the cache, which means we get no benefit from that point on; if we go over the TotalMemoryLimit it’s likely that just about the entire cache will be emptied. We also have to remember that the cache will grow as a result of normal usage during the day, so we should plan to use only a fraction of the available memory when warming the cache.
  • We want to avoid cache fragmentation. The SE cache uses a data structure called the data cache registry to store its data, and when the FE requests a (potentially filtered) subcube of data from the SE that data subsequently gets stored in the SE cache as an entry in the data cache registry. As a result it is possible that instead of having one entry in the SE cache that contains a lot of data, the same data could be spread across multiple smaller entries. This is bad because:
    • It means that lookup in the cache is slower as there are more entries to search
    • The SE doesn’t bother to continue searching in the data cache registry after it has compared the current request with more than 1000 subcubes, and will then go to disk even if the required data is in the cache. If the cache is filled with lots of small entries then this will happen more often.
    • If the FE requests the slice {A,B} from a hierarchy in a subcube and the data for those two members A and B are held in different entries in the cache, then it will not be found. It will only be matched if a single entry in the cache contains data for both A and B.
  • Data in the SE cache can be aggregated to answer a request – but only if the data itself if aggregatable, and this is not the case when there are distinct count measures in the measure group or when there are many-to-many relationships present. Assuming that the cache is aggregatable though we should perform SE cache warming at lower rather than higher granularities, although if we go too low down we’ll end up caching data we don’t need, using up too much memory and find that the cache warming itself takes too long.

Of course all of the above is subject to change in later versions, but I understand that this is the current situation with SSAS 2008 SP1.

How can we find out what data to use to warm the SE cache then? The MDX queries that have been run by users aren’t much help, but we can see what subcubes are being requested when a query is run by looking at the Query Subcube and Query Subcube Verbose events in Profiler; we could therefore run a Profiler trace to capture this data, but in fact the easiest option is to just use the Usage Based Optimization wizard’s Query Log since it contains the same data as Query Subcube. The following blog entry from James Snape shows how to interpret this information:
http://www.jamessnape.me.uk/blog/2006/11/09/SubcubeQueries.aspx 
A truly sophisticated SE cache warmer would be able to take this data, along with data about attribute relationships, and work out what the most recent, frequent (but slowest) subcube requests were, then try to find some lowest (but not too low) common granularities to use for cache warming, perhaps also slicing so that only the most recent time periods were used as well.

Last of all, how do we actually go about loading data into the SE cache? We could run MDX queries but this is a bit of a risky approach since we may not know what calculations are present and so we can’t be sure that our query requests the right data; the best option is to use the CREATE CACHE statement that is described here:
http://blogs.msdn.com/sqlcat/archive/2007/04/26/how-to-warm-up-the-analysis-services-data-cache-using-create-cache-statement.aspx

Slow metadata

There’s an interesting thread on the Analysis Services MSDN Forum that I’ve been following for a while now (it’s over 40 posts long at the time of writing) that gives a good explanation of why metadata access on very large or complex cubes can be so painfully slow. Here’s the link:

http://social.msdn.microsoft.com/Forums/en-US/sqlanalysisservices/thread/679b510f-3d42-4eed-ba2c-df2a2bd39a00

In summary, it turns out that the more files there are on disk in your SSAS database folder, the slower metadata reads become, and when you reach very large numbers of files almost any kind of metadata access becomes incredibly slow. There are a lot of different factors determining how many files are created for your SSAS database but large numbers of cubes, attributes, partitions and aggregations (which end up being multiplied across each partition) are the main culprits; the problem is that the way the EstimatedSize property on metadata objects has been implemented results in SSAS having to iterate over all the files in every subdirectory in the database’s directory in the data folder to find their size.

One workaround is to set the Storage Location property of your partitions to somewhere outside the data directory since this means they won’t get scanned, but doing this makes backup and restore difficult and means you get inaccurate size estimates. The only other option is to reduce the number of partitions and aggregations you have; given that the SQLCat team are now saying that partitions of 50-60 million rows each are feasible in 2008, and that you can use different partition designs for partitions containing old and new data, there’s probably a lot of scope for doing this in most cubes. Hopefully, though, this problem will be fixed in a future release…

adCenter and SSAS 2008 White Paper

There’s a new white paper on how adCenter uses Analysis Services 2008 available here:
http://technet.microsoft.com/en-us/library/ee410017.aspx

For the most part it’s a fairly run-of-the-mill list of reasons to upgrade to 2008 (and, to be honest, if you’re running 2005 I can’t think of a good reason not to upgrade to 2008 – the performance is so much better) but there are several good nuggets on performance tuning for large SSAS implementations; for example, table 3 has a list of ini file settings for optimising processing. Worth a look.