Viewing Partition Slice Information

The other week I discovered that it was possible to view the partition slice information discussed here using the DISCOVER_PARTITION_DIMENSION_STAT schema rowset. I’ve always thought it would be cool to be able to visualise this information so you could easily check to see whether your dataid ranges overlapped or not; to do this, though, you would need to see the slice information for all your partitions in one resultset and DISCOVER_PARTITION_DIMENSION_STAT only returns data for one partition. So this week I dusted off my coding skills and wrote a sproc that loops through every partition in a measure group, calls DISCOVER_PARTITION_DIMENSION_STAT and concatenates the results as well as checking to see which partitions have ranges which overlap each other. I’ve added it to the source code for the Analysis Services Stored Procedure Project, which you can get hold of here:

http://www.codeplex.com/ASStoredProcedures

Note that the function isn’t part of the latest release, you will have to download the code and compile it yourself. As an example of what it does, in SQLMS if you connect to the Adventure Works database and then run the following statement:

call assp.DiscoverPartitionSlices("Adventure Works", "Internet Sales")

…you’ll get a resultset showing all the partition slice information for every attribute on every partition in the Internet Sales measure group.

Having done this I was able to put together a simple SSRS report to display this information. I say simple, it probably took me longer to create the report than to write the sproc because I couldn’t work out how to visualise the ranges in a useful way; in the end I used the RangeColumn chart style from Dundas, which seemed to be the best fit. Here’s an example of what the report returned for the Date attribute on the Date dimension and the Internet Sales measure group:

sliceDate

I cheated a bit by setting the IndexBuildTheshold very low so that indexes would be created for the 2001 and 2002 partitions, but it shows what a healthy set of slices looks like (the Internet Sales measure group is partitioned by Calendar Year). Another attribute worth looking at is the Product attribute on the Product dimension:

sliceProd

What we see here is that in 2001 and 2002 only a small number of products were actually sold compared to 2003 and 2004. In this situation it might be worth trying to reorder the members or set a slice on 2001 and 2002 so that AS knows about this distribution: it would mean that if you were running a query for the sales of a Product that was only sold in 2003/4 for all time periods, AS would not need to scan the 2001 and 2002 partitions because it would know in advance that there would be no data for that product in those partitions. One of the improvements I would like to make to the sproc is to display member names and unique names as well as dataids so that the slice ranges are easier to understand; Mosha also had the idea that it would be good to be able to take these unique names and use them to set a slice on a partition for exactly the scenario I’ve just described.

Lastly, I found what is almost certainly a bug in AS when I was looking at the Calendar Semester attribute on the Date dimension:

sliceCalSem

Calendar Semester has, as you would expect, a one-to-many relationship with the Year attribute and we would expect to see no overlaps as a result -but as the report shows, that isn’t the case. When you run a query on this attribute, for example:

select measures.[internet sales amount] on 0,
{[Date].[Calendar Semester].&[2004]&[1]} on 1
from [adventure works]

You see partitions being hit when they shouldn’t be hit – in this case the 2003 and 2004 partitions – and this even sometimes seems to happen when you have a slice explicitly set on the partition using the Year attribute, which is pretty bad. Greg Galloway did a bit of digging on this and discovered that the SQL generated returned rows in the same mixed-up order as the dataids of the members, and fixing the SQL to return members in the order you want the dataids stops this happening to a certain extent. Mosha says that it’s a problem with the decode tables created within the dimensions, but whatever the cause you may want to check your dimensions to see if it’s happening to you. This kind of problem has been noticed before and I knew about the Disable Prefetch Facts connection string property that was introduced to deal with it, so I wonder if this is related in some way? Anyway, I logged the bug on Connect and I’ll blog about any new information that I get on it:

https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=308793

Fun and Games with Schema Rowsets

One of the (few) new features in AS2008 are Data Management Views, which give you a table-like way of looking at all kinds of useful metadata and management style information. However, as Darren points out here:
http://geekswithblogs.net/darrengosbell/archive/2007/08/13/SSAS-2008-Data-Management-Views.aspx
…this isn’t really anything you can’t get from schema rowsets at the moment, and the great set of functions that Darren wrote for the Analysis Services Stored Procedure Project (http://www.codeplex.com/ASStoredProcedures/Wiki/View.aspx?title=XmlaDiscover&referringTitle=Home) make it very easy to get at this information in MDX.

I’ve been playing around with schema rowsets a bit over the last week or so, mostly because I’m working on an updated version of my SSIS cache-warmer (more of which soon), but it’s given me a number of interesting ideas for other things that could be possible. For instance, one of the big missing features from AS at the moment in my opinion in a query governor: it’s all too easy for one user to bring your server to its knees by running the query from hell. In the article above, Darren talks about the information in the DISCOVER_SESSIONS and DISCOVER_CONNECTIONS rowsets, and how you can use the ids they return to kill sessions and connections; both of them contain other interesting information (which I think probably need some time spent researching to make sure I know exactly what they mean) on the time since the last command, the CPU time taken by the last command, what the last command was, how long the last command took to run etc. With this information I guess it would be possible to build a rudimentary query governor in SSIS: basically a package that was scheduled to run every ten seconds or something, checked the overall CPU usage (or other perfmon counters) on the machine and if it was greater than a certain threshold looked at the sessions that were open, tried to work out which were the guilty parties and killed them.

There’s other useful stuff available from schema rowsets too that doesn’t seem to be documented anywhere. If you look at the DISCOVER_SCHEMA_ROWSETS rowset (try running call assp.Discover("DISCOVER_SCHEMA_ROWSETS") if you have ASSP installed) you can see what schema rowsets are available. One interesting one I found was DISCOVER_PARTITION_DIMENSION_STAT; for example run

call assp.Discover("DISCOVER_PARTITION_DIMENSION_STAT", "<DATABASE_NAME>Adventure Works DW</DATABASE_NAME><CUBE_NAME>Adventure Works</CUBE_NAME><MEASURE_GROUP_NAME>Internet Sales</MEASURE_GROUP_NAME><PARTITION_NAME>Internet_Sales_2004</PARTITION_NAME>")

…on Adventure Works. What it’s showing is the partitioning and indexing information for an individual partition, information discussed in this SQLCat blog entry from earlier this year:
http://blogs.msdn.com/sqlcat/archive/2007/03/05/ssas-partition-slicing.aspx

This article sends you off to the info*.xml files created when you process a partition to get this information, but here it is ready to use. The last three columns the rowset returns are the important ones: they tell you whether each attribute is indexed in the partition and if so what the maximum and minimum members in the partition are. These maximums and minimums are expressed as DataIDs, AS’s own internal surrogate keys, and you can find out what a member’s DataID is by using the (wait for it) DataID function as follows:

with member measures.showdataid as
DataID([Date].[Date].currentmember)
select measures.showdataid on 0,
[Date].[Date].members on 1
from [adventure works]

When I first found out about this I thought it would be really cool to be able to visualise this information somehow so you could check to see whether your partition slicing was in a healthy state or not. For instance, if you look at the 2001 partition from the Internet Sales measure group:

call assp.Discover("DISCOVER_PARTITION_DIMENSION_STAT", "<DATABASE_NAME>Adventure Works DW</DATABASE_NAME><CUBE_NAME>Adventure Works</CUBE_NAME><MEASURE_GROUP_NAME>Internet Sales</MEASURE_GROUP_NAME><PARTITION_NAME>Internet_Sales_2001</PARTITION_NAME>")

…you can see there are no indexes at all. This is because there is so little data in it, it falls below the Index Build Threshold server property, and this in turn means that every time you run a query then that partition is going to be scanned. Another thing you might need to watch out for, if you’re not using the Slice property on each partition (and you really should be, even on MOLAP partitions, despite what some people say) is the scenario the SQLCat team describe where your partitions end up with overlapping DataID ranges, meaning that AS ends up scanning partitions unnecessarily when you run queries. I’m sure I could put together some kind of SSRS report showing all the partitions in a measure group side by side, and perhaps using a stacked bar chart or something try to represent the DataID ranges for a given attribute.

Precision Considerations for Analysis Services white paper

Another week, another white paper from the SQLCat team (do these guys ever sleep?) on the topic of inconsistent numbers being returned from different queries against the same cube as a result of precision issues:
It’s not as bad as it sounds, but definitely something to read before you get hauled up in front of the end users when they notice this! 

Maximising Attribute Relationship Aggregations in Changing Dimensions article

Erik Veerman has written a nice article for the Solid Quality Mentors (who I also do training for) newsletter on attribute relationships and the RelationshipType property:
 
As he says on his blog, if you want to see more articles like this you can sign up for the Solid Quality newsletter here:

One-to-One Attribute Relationships: Forks versus Chains

It’s a fairly common occurrence to have several attributes in a dimension describing the same thing: products can have long and short descriptions, for example, or you might want to display dates in different formats, or show the full names of states or just their abbreviations. AS doesn’t give you any functionality to specifically handle this – something I’ve heard from people who’ve used other OLAP tools is that it would be great to set up ‘aliases’ for attributes, rather like how you can set up translations – so you just have to design it into your dimension. One thing I’ve noticed, however, is that when you’re designing attribute relationships to cater for this scenario it’s tempting to do so in a particular way that leads to problems later on; at least I’ve done this in the past and seen several other people do the same thing, so I thought it was worth blogging about. 

Here’s an example Time dimension based on the Adventure Works relational data:

forktime

We’ve got a simple Year-Quarter-Month-Date structure, but we want to see the same structure once with a short description (eg ‘2004’ for the year) and once with a long description (eg ‘Calendar Year 2004’). There are two natural user hierarchies that reflect these two choices, but notice the way the attribute relationships are designed: they take no account of the fact that there’s a one-to-one relationship between the two sets of attributes. If you use BIDS Helper’s excellent ‘Visualise Attribute Lattice’ functionality you can see the relationships make a fork shape:

forklattice

What’s wrong with this? Well, rather than answer that question, let’s look at my preferred alternative:

chaintime

Here’s the accompanying lattice:

chainlattice

You can see that instead of a fork shape we’ve now got a long chain, and the one-to-one relationships I mentioned earlier are now in place; remember that this doesn’t affect the way the dimension appears to the end user at all. Incidentally, you might wonder like I did if the Cardinality property of a relationship should be changed from ‘Many’ to ‘One’ for one-to-one relationships: it’s probably worth doing but according to BOL it doesn’t have any effect at the moment:
http://msdn2.microsoft.com/en-us/library/ms176124.aspx

The advantages of this design are as follows:

  • More efficient autoexists. In the fork approach, if you crossjoin Year with Month Long Desc AS will have to resolve that relationship through the Date attribute; in the chain approach because there’s a direct relationship between the two working out which members on Month Long Desc exist with Year is going to be faster. You’ll probably only notice this on fairly large dimensions though. UPDATE – see Mosha’s comment below, this isn’t correct.
  • In general, I reckon that the aggregation design wizard will produce better designs with the chain approach. Again I’m not sure how much difference there will really be, but when I ran the wizard on a test cube containing just my fork dimension  it came up with three aggregations at Month/Quarter Long Desc; Month Long Desc/Quarter; and Year. When I did the same thing on a test cube containing just my chain dimension I got four aggregations at Month; Quarter; Year; and All. In the latter case, of course, the Calendar Long Desc hierarchy can benefit from aggregations built on the non-Long Desc attributes, and overall the aggregation coverage seems better. Results, however, may well vary in the real world…
  • Scoping calculations becomes much, much easier with the chain approach rather than the fork approach. Imagine that you want to implement something like a Total-to-Date calculation, which sums up a measure from the beginning of Time to the currentmember on Time. Naturally you want it to work on both your user hierarchies, but you’ll find that with the fork approach it becomes fiendishly difficult to get your scoping correct – it’s a lot harder than you think, and in fact I’ve not come up with a satisfactory solution. With the chain approach however there’s a really elegant solution: because of strong hierarchies, every member you select on the Calendar hierarchy automatically forces the currentmember on the Calendar Long Desc hierarchy to the equivalent: so, selecting ‘2004’ on Calendar changes the currentmember on Calendar Long Desc to ‘Calendar Year 2004’. This means you can scope all your calculations on the Calendar Long Desc user hierarchy and they’ll automatically work on the Calendar hierarchy; for example, here’s how to do the TTD calc:

Create Member CurrentCube.Measures.TTD as null;

Scope(Measures.TTD);
    Scope([ForkTime].[Calendar].Members, [ForkTime].[Calendar Long Desc].members);
        This=sum({null:[ForkTime].[Calendar].currentmember}, [Measures].[Sales Amount]);
    End Scope;
End Scope;

(Thanks are due to Richard Tkachuk for answering some emails on these issues)

Scale-Out Querying with Analysis Services Using SAN Snapshots

Those people on the SQLCat team have been busy: yet another white paper is out, this time on scale-out using SAN snapshots. You can download it here:
 

SP2 Slowdown?

 Despite all the performance improvements that have gone into AS SP2, several of my customers reported worse query performance after migrating. Today I saw two new KB articles (frustratingly short on detail) that suggest that these performance problems may have been fixed:
 
Interesting…

AS2008 webcast now available

If, like me you missed the recent AS2008 new features webcast then you can now download the recording here:
 
It’s a very good summary of what’s coming, although the content of the presentation contains no major new revelations. The question/answer session is a bit more revealing: we’ll be getting what sounds like very basic execution plans (possibly visual), there’s no LINQ integration as yet, and there are no changes to SSRS/SSAS integration (WHEN are the people in Redmond going to wake up and do something about this???). 

Cumulative Update Package 2 for SP2

For those of you who like to live life on the bleeding edge, the second cumulative update package for SP2 (build 3175) is now available for download here:
Some relationally-minded friends of mine have commented that these hotfix rollups always seem to contain more than their fair share of AS fixes, and I have to admit that they are probably right. I also heard about one welcome change that I believe is now in there: the AS dev team have got round to making the query-cancellation functionality a lot more responsive, so no more hours wasted waiting for the query to end after you clicked cancel. Hurray!

Creating local cubes with AMO

Interesting nugget from the MSDN Forum, in a post by Adrian Dumitrascu: you can create local cubes in AMO. Here’s the text of the post:
 
AMO (Analysis Management Objects, the object model for administering AS2005, the successor for DSO from AS2000) also works with local cube files (.cub). You can use it to create/alter objects or to refresh the data by re-processing every night, as you mentioned.
Sample code to connect to a local cube file:

using Microsoft.AnalysisServices; // the .dll is in "%ProgramFiles%\Microsoft SQL Server\90\SDK\Assemblies", also in GAC

Server server = new Server();
server.Connect(@"Data Source=c:\MyLocalCube.cub"); // this will create an empty local cube if not there already
… // do something, for example process cubes
server.Disconnect();

 
Makes sense that you can do this, when you think about it, given that everything’s XMLA behind the scenes.