On the internet

COP Databases

COP (Column Oriented Processing) databases seem to have been making something of a comeback recently. You may be wondering what a COP database is; the best short summary is from the OLAP Report’s glossary:

This is a class of database that is quite different to, but is nevertheless sometimes confused with, OLAP. COP databases perform high speed row counting and filtering at a very detailed level, using column-based structures, rather than rows, tables or cubes. COPs analyze detail and then occasionally aggregate the results, whereas OLAPs are largely used to report and analyze aggregated results with occasional drilling down to detail.

Although less well known and recognized than OLAP, COP databases have also been in use for more than 30 years (for example, TAXIR), which means that, just like OLAP, they predate relational databases. COP products are used for very different applications to OLAP and examples include Alterian, Sand Nucleus, smartFOCUS, Sybase IQ, Synera, etc.

As Nigel says, they have been around for ages but have never seemed to be as popular as OLAP. I have no idea why this is because the technology is fundamentally good – my very first project using OLAP Services back in the SQL 7 days was an attempt to replace a proprietary COP database, still desktop-based and not even 32-bit code, and we still struggled to match its performance on some queries. Maybe it just needed a big software company to buy into the sector in the way that Oracle and Microsoft did with OLAP for it to take off. There are certainly some big companies using it though – for example Tesco, the biggest supermarket chain in the UK:

http://www.sand.com/resources/casestudies/tesco-dunnhumby/

Anyway, the prompt for this blog entry was several people asking me about COP databases over the last few months and then, this morning, reading this entry on Shawn Rogers’ blog about a new COP product called Paraccel:

http://www.b-eye-network.com/blogs/rogers/archives/2007/10/paraccel_and_su.php

It got me thinking… I wonder if Microsoft should think about buying or developing a COP database? I have no idea whether it makes sense or not, but you could integrate it as another engine within Analysis Services and perhaps even create a hybrid of OLAP and COP. People often complain about how bad AS is at querying transaction-level data, and drillthrough at the moment is very difficult to get good performance from; I wonder if a COP would help here. Certainly Paraccel’s AMIGO feature where it can synch with an existing relational database sounds very much like processing a cube (only faster); give it the ability to be queried in MDX and think of the fun… and in the long run, maybe all this stuff should have closer integration with the relational database, as Oracle are doing.

One thought on “COP Databases

  1. Hi Chris,
    A startup by Stonebraker (of Postgres fame) is a recent entrant
    in this space – http://www.vertica.com/
    The Big-3 are typically reactive when it comes to fundamental shifts
    at the storage level – efficient COPs will entail a bottom-up design
    from the disk level. I bet they may have people looking at this, and
    plunge in once they get an estimate of the real $s in this space. They
    also have the added responsibility of providing a staged migration
    story from legacy/traditional storage to this new format.
    +R

Leave a ReplyCancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.