Comparing MDX Query Results With Tablediff

Recently I was working on a project where the customer wanted to upgrade from SP1 to SP2 but found that when they did so the results of some of the calculations on their cube changed. Unfortunately these changes weren’t easy to spot: their MDX Script was >1500 lines long and the bugs they were running into were dependent on the query being run and whether the cache was cold or warm (in fact one of them was the bug I blogged about here). When testing various workarounds and hotfix builds, the problem I ran into was a problem I’ve run into before on numerous occasions: when I changed something on the MDX Script, how did I know it didn’t break something else?

In the past what I’ve done was take a few MDX queries, run them, copy and paste the results into Excel, make my changes and then rerun the same queries and paste the new results into Excel again and use Excel formulas to compare the old and new resultsets. However in this case this approach just wasn’t feasible and I started looking for a better way of comparing large numbers of MDX resultsets. Now there is definitely a tool to be written here, probably one which runs batches of queries, serialises the cellsets and saves them to disk, and then compares the cellsets and highlights any cells with differences, but I didn’t have time to write it. I was talking the problem over with Didier Simon of PSS, though, and he made a great suggestion – use tablediff.exe, the command line utility you get with SQL Server to compare relational tables. You can read more about it here:
http://msdn2.microsoft.com/en-us/library/ms162843.aspx

This turned out to be the key. I cannibalised bits of my Analysis Services cache warmer and created two SSIS packages, the first of which ran a batch of MDX queries retrieved from a relational table (originally saved from a Profiler trace) through SQL Server using OpenRowset and saving the results in a table using a SELECT INTO. Here’s what the control flow looked like:

CreateResults

And here’s an example of the kind of SQL query I was generating:

select identity(int, 1,1) as idcolumn, * into Query1
from openrowset(‘MSOLAP’, ‘Datasource=localhost; Initial Catalog=Adventure Works DW’,
‘ with cell calculation test for ”(*)” as ”cstr(round(measures.currentmember,5))”
select Measures.[Internet Sales Amount] on 0,
[Date].[Calendar Year].Members on 1
from [Adventure Works]’)

A couple of things to note – Tablediff needs a column which can uniquely identify each row in the tables its comparing, hence the identity column. I also found that I was running into the precision problems described here and as a result, tablediff was flagging up cells as having different values when in fact the values were differing only by a miniscule amount, so I created a calculated cell assignment in the WITH clause which rounded up every value to five decimal places. I could do this because I knew none of my queries had WITH clauses, but another solution would have been to create a session calculated cell assignment which did the same thing.

After running this package I had a SQL Server database with as many tables containing results as I had queries. I could then make my changes to the cube and rerun the same package to dump a new set of results to a different database, and then run my second package to compare the results. This was, again, a fairly simple package to put together:

compareresults

All I did was use a ForEach loop to loop over the tables in the database containing the first set of results, and then use an Execute Process task to call tablediff to compare each table with the equivalent table in the second database; if it found any, I used the -et argument to save the output to another table. I also used the -c argument to get column-level differences and the -b argument to specify the number of bytes to compare for large objects, necessary because the SELECT INTO creates an ntext column for all the member names on the Rows axis.

Overall, it’s not an ideal solution (coding a proper app is definitely the way to go) but it did the job, and hopefully this will be useful to someone else…

8 thoughts on “Comparing MDX Query Results With Tablediff

  1. That is an interesting approach.  We also try to make sure our mission-critical reporting is not broken by upgrades.
     
    At my company our mission-critical SSRS reports against SSAS are render-able to both PDF and XML.  In order to make sure an SP doesn\’t break something, we run reports and create a whole bunch of "before" XML snapshots.  Then we apply the SP, then we repeat the running of the same reports and create a whole bunch of "after" XML snapshots.  Finally we can XML/DIFF the "before" and "after" snapshots in order to assure ourselves that we aren\’t breaking anything mission-critical.  The XML/DIFF points us to the precise measures and calculations that have caused problems.
     
    Again, we only go to this trouble for the mission-critical reports that are owned within the I.S. department.  If some obscure pivot table owned by some user is broken, they are S.O.L.  There is really no feasible way for us to test everybody\’s pivot tables and spreadsheets for them.
     
    SP2 caused (and still causes) trouble for us as well… (http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=2557790)

  2. Hi David,
     
    I like your idea – it\’s a bit more robust than mine, I think. Can you give me some more details on how you run the XML/DIFF? My XML knowledge is pretty poor…
     
    Chris

  3. The first step is to be able to send critical reports to XML.  Normally we would write a very stripped-down version of the report (RDL) which is specifically designed for the XML rendering extension (<Render><Extension Name="XML" … /></Render>).
     
    The next step is to find an XML diff program.  Microsoft has a command line version and a GUI version, both called "XML Diff & Patch".
     
    Finally you can accumulate "before SP" and "after SP" reports and run your diffs between them.  A regular diff tool may do the trick as well but it wouldn\’t be as helpful identifying the particular columns that changed.
     
    I think the biggest challenge is determining what reporting is "critical" to an organization and, out of that, what reporting the I.S. department is accountable for.  Because OLAP is a new animal and users can do so much "software development" on their own, it is difficult to get everyone on the same page during an upgrade.
     
    Hopefully we are getting to the end of the Microsoft upgrades which actually change query results.

  4. Hi Chris. Chris, do you know of a good way to compare if two tabular cubes (loaded with data) agree on all measures? Or, if it’s not feasible to check all measures, maybe a random subset of them? Something like a randomized test? I’ve got 2 cubes with the same structures but loaded from 2 different databases, where 1 of them is a transformation of the other (to remove unnecessary fluff from some dimensions which then forces us to remap certain columns in fact tables.) What’s the best way to check if the cubes agree? Many thanks.

Leave a Reply to Dariusz SpiewakCancel reply