Viewing Partition Slice Information

The other week I discovered that it was possible to view the partition slice information discussed here using the DISCOVER_PARTITION_DIMENSION_STAT schema rowset. I’ve always thought it would be cool to be able to visualise this information so you could easily check to see whether your dataid ranges overlapped or not; to do this, though, you would need to see the slice information for all your partitions in one resultset and DISCOVER_PARTITION_DIMENSION_STAT only returns data for one partition. So this week I dusted off my coding skills and wrote a sproc that loops through every partition in a measure group, calls DISCOVER_PARTITION_DIMENSION_STAT and concatenates the results as well as checking to see which partitions have ranges which overlap each other. I’ve added it to the source code for the Analysis Services Stored Procedure Project, which you can get hold of here:

http://www.codeplex.com/ASStoredProcedures

Note that the function isn’t part of the latest release, you will have to download the code and compile it yourself. As an example of what it does, in SQLMS if you connect to the Adventure Works database and then run the following statement:

call assp.DiscoverPartitionSlices("Adventure Works", "Internet Sales")

…you’ll get a resultset showing all the partition slice information for every attribute on every partition in the Internet Sales measure group.

Having done this I was able to put together a simple SSRS report to display this information. I say simple, it probably took me longer to create the report than to write the sproc because I couldn’t work out how to visualise the ranges in a useful way; in the end I used the RangeColumn chart style from Dundas, which seemed to be the best fit. Here’s an example of what the report returned for the Date attribute on the Date dimension and the Internet Sales measure group:

sliceDate

I cheated a bit by setting the IndexBuildTheshold very low so that indexes would be created for the 2001 and 2002 partitions, but it shows what a healthy set of slices looks like (the Internet Sales measure group is partitioned by Calendar Year). Another attribute worth looking at is the Product attribute on the Product dimension:

sliceProd

What we see here is that in 2001 and 2002 only a small number of products were actually sold compared to 2003 and 2004. In this situation it might be worth trying to reorder the members or set a slice on 2001 and 2002 so that AS knows about this distribution: it would mean that if you were running a query for the sales of a Product that was only sold in 2003/4 for all time periods, AS would not need to scan the 2001 and 2002 partitions because it would know in advance that there would be no data for that product in those partitions. One of the improvements I would like to make to the sproc is to display member names and unique names as well as dataids so that the slice ranges are easier to understand; Mosha also had the idea that it would be good to be able to take these unique names and use them to set a slice on a partition for exactly the scenario I’ve just described.

Lastly, I found what is almost certainly a bug in AS when I was looking at the Calendar Semester attribute on the Date dimension:

sliceCalSem

Calendar Semester has, as you would expect, a one-to-many relationship with the Year attribute and we would expect to see no overlaps as a result -but as the report shows, that isn’t the case. When you run a query on this attribute, for example:

select measures.[internet sales amount] on 0,
{[Date].[Calendar Semester].&[2004]&[1]} on 1
from [adventure works]

You see partitions being hit when they shouldn’t be hit – in this case the 2003 and 2004 partitions – and this even sometimes seems to happen when you have a slice explicitly set on the partition using the Year attribute, which is pretty bad. Greg Galloway did a bit of digging on this and discovered that the SQL generated returned rows in the same mixed-up order as the dataids of the members, and fixing the SQL to return members in the order you want the dataids stops this happening to a certain extent. Mosha says that it’s a problem with the decode tables created within the dimensions, but whatever the cause you may want to check your dimensions to see if it’s happening to you. This kind of problem has been noticed before and I knew about the Disable Prefetch Facts connection string property that was introduced to deal with it, so I wonder if this is related in some way? Anyway, I logged the bug on Connect and I’ll blog about any new information that I get on it:

https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=308793

6 thoughts on “Viewing Partition Slice Information

  1. Thanks for this great article on partition slice information. I\’ve been relying on examining the INF*.XML files, which is a much more painful way to go about this process. I\’m going to imliment your ideas, and see if I can\’t expand on them a little. I\’ll post backi any significant improvements I find.
     
    On your closing note on the "partition bug". I raised a ticket with Microsoft some months ago, and have been engaged in a regular battle ever since. i provided them with a detailed explanation, a businesss justification and suggestions. It seems that they just can\’t get it. I\’ve been leading with them to connect me with the Query engine and the Storage engine developers, so that I can explain the benefits to someone who understands. (Last email went out last week.)
     
    Basically, all they need to do is to allow you to sort attributes as they are processed by the storage engine, so that the ordinal come in sorted. Microsoft has suggested that I manually load the dimension information (i.e. via SSIS packages), but the effort involved and the number of cubes I\’d then have to manually manage make this unworkable.
     
    I did manage to create the right indexes in SQL Server to trick the SQL optimizer into sending the rows back roughly sorted. It\’s much better (I have 100 partitions and the extraneous partition scans have dropped 75%), but this of course isn\’t very robust or stable. It\’s very much subject to change.
     
    Why this is such a battle, when the benefits are quite obvious I can\’t understand. MSAS is a wonderful piece of technology, but Microsoft is going to have to fix their support processes.
     
    I\’d be happy to provide you with the specifics, if it\’ll help get Microsoft implement this smal change.

  2. This is an awesome post. Very, very good stuff.
     
    I too have been fighting slice issues recently and am curious to give this a shot.
     
    Thanks again.

  3. Hi Chris
     
    Thanks !. It is exactly what I was looking for. Your improvement could be made by adding Dataids to an array and then executing below mdx to retrieve unique names and keys, just once record loop is finished. I\’ve put three sample DataIds:
     
    With

    Member Measures.MemberKey as [Date].[Calendar Year].currentmember.Properties("Key")
    Member Measures.MemberName as [Date].[Calendar Year].currentmember.Properties("Name")
    SELECT
    {Measures.MemberKey, Measures.MemberName} ON 0,
    FILTER([Date].[Calendar Year].members, DataId([Date].[Calendar Year].currentmember) = 2 OR
    DataId([Date].[Calendar Year].currentmember) = 3 OR
    DataId([Date].[Calendar Year].currentmember) = 4 ) on 1
    FROM [Adventure Works]
     
     
     

  4. I’ve just been suffering with this issue, Thanks for the procedure, it has been very useful!

    I’ve resolved some of my issues by creating some indexes on the source table for the columns used in the select statement issued by SSAS when processing the dimension attributes. This meant that my attributes where in the correct order so the dataids are now predictable.

    In doing this I thought that a good solution to this whole issue is for the Analysis Services team to make a small change to the dimension processing routine to include an order by statement in the select distinct used. The order by would be for the field used to sort the attribute members whether it be at the key, name, attributekey or attribute name. These values are all included in the column list for the select (for obvious reasons so it should be a simple change for quite a big benefit.

    I don’t know how to suggest this change to microsoft but I thought you might possibly have a bit more gravitas if you also think it is a good idea.

    Any comments welcome!

Leave a Reply