MDX

LINQ to MDX

Seeing that Marco Russo has released his book "Programming Microsoft LINQ" reminded me of a conversation I had with him a while ago about something I’ve heard various people ask about over the last year – will there be a LINQ to MDX?

Before we go on, I should state that it’s my opinion that there isn’t a big enough market out there for anyone (Microsoft or a third party) to justify spending time developing LINQ to MDX. That’s not to say that I wouldn’t want to see it – I would – just that I doubt anyone much would use it. MDX remains too much of a niche language, and off-the-shelf tools work well most of the time so there’s less need to write custom MDX-generation code. As far as I know Microsoft isn’t planning on developing LINQ to MDX and I’d be surprised if it ever did, so this will remain a theoretical discussion.

But for the sake of argument if you were to implement LINQ to MDX the main problem you’d have to tackle would be the same one you have with using MDX in Reporting Services and Integration Services: MDX can’t guarantee fixed column names for any given query. However I had an idea on how to avoid this, and that is to think in terms of LINQ to MDX sets rather than LINQ to MDX queries. So for example if you take the following SQL query on a dimension table:

Select Year, Quarter, Month
From TimeDim
Where Year=2008 and (Month=March or Month=April)

that would then translate easily into the following MDX set expression:

{([TimeDim].[Year].[2008], [TimeDim].[Quarter].[Q3], [TimeDim].[Month].[March]), ([TimeDim].[Year].[2008], [TimeDim].[Quarter].[Q3], [TimeDim].[Month].[April])}

MDX sets have to be made up of tuples containing the same dimensionality, and if you think of a set’s dimensionality in terms of columns in a SQL SELECT statement then you can see how that might map onto LINQ concepts. Instead of using LINQ to create an MDX SELECT statement directly, you’d use LINQ to create MDX sets and then pass all of these sets into another function which would then run the query using the sets created as axes.

Since I’m no LINQ expert this was the point where I dropped a mail to Marco to ask him his opinion; he thought it was feasible and even came up with an idea of what the code might look like in C#:

var timeSet = from period in cubeBudget.TimeDim.Members
            where period.Year == 2008
                    && (period.Level == TimeDim.Level.Year
                        || period.Month == "March"
                        || period.Month == "April" )
              select period;

Var measures = from measure in cubeBudget.Measures
               Where new string[] { "Sales", "Quantity", "Price" }.Contains( measure.Name )
               select measure;

Var query = from measure in measures
            from period in timeSet
            select new { period.Name, measure.Name, measure.Value };

var cellset = from cell in cubeBudget
              where cell.Columns( measures )
                    && cell.Rows( timeSet )
              select cell;

In the code above, timeset would define the set of members on the Period dimension you wanted to use and measures would return the set of measures. Then you could use them in two ways: query would return a flattened rowset and cellset would return a cellset. But this all seems very convoluted and probably just as confusing to the average developer as raw MDX.

Another alternative approach would ignore MDX altogether and query Analysis Services using SQL directly (see http://cwebbbi.spaces.live.com/blog/cns!7B84B0F2C239489A!751.entry for more details on this topic), although I’m sure you’d run into the limitations of the SQL that is supported very quickly and in any case you lose all of the flexibility and functionality of the MDX language when you do this.

Maybe if we’re looking for a way to programmatically generate MDX then LINQ isn’t the way to do it. It’s something olap4j is working towards and they have taken an approach that is much more in tune with the multidimensional nature of MDX. One of the ‘open issues’ that caught my eye in the section of the olap4j spec that deals with this functionality is the question "Is this API at the right level, or is it too close to MDX?", meaning I guess that it would be all too easy to come up with an interface that is just as complex as MDX itself. Where would you draw the line between ease-of-use and functionality? Is it the MDX language that is confusing for people, or the multidimensional concepts (concepts that any interface would have to reflect) that underpin it? Can you abstract MDX to an interface to make it easier to use? I wish I knew more Java so I could test drive olap4j – is there anyone reading this who is using it? As always, I’d be interested to hear anyone’s thoughts on this matter so please leave some comments…

8 thoughts on “LINQ to MDX

  1. Hi Chris,
     
    As you mentioned, there is always a problem of getting the API close to MDX or making it closer to SQL. The core problem is that the olap-way-of-thinking is not so trivial and Marco\’s C# code is the perfect proof: Comparing to Linq To SQL or Linq To Entities, Linq To MDX is not intuitive. It looks like a work-around to write MDX query as Linq. Look how many code you have to write to declare a very simple multidimensional query.
    I still think that as for now, MDX is the most convenient way of writing multidimensional queries. I don\’t see myself writing long Linq to MDX code when I write the same query in MDX in few rows. The problem of understaing the multidimensional model will always be.
    As for Olap4j – it seems very nice but it has the same problem: The generation process of queries seems to be very hard and long. Common functions such as CrossJoin() are missing in the API. This project has to grow more in order to make our life better.
     
    I know that I wear the black hat of thinking on my head this time, but I\’ll be happy to see open source Linq to MDX. Maybe I\’ll see I\’m wrong and I\’ll love it.

  2. Chris,
     
    every now and then I tinker with the idea of "end-user query and ETL tool" that will (finally) deliver the promised end-user autonomy for BI users. The basic logic of the tool would stem from the fact that most of the reporting requirements can easily be decomposed into set operations ("set" being just a set of business entities, not MDX sets..).
     
    For example if you go for this one: "Give me the February sales by customer for those customers that made their first purchase with us in January". This would be pretty tough SQL to produce for most of the end users using any given query tool.
     
    However, if they were given Excel, they would pretty quickly come up with a solution that involves sorting, autofiltering, lookups and creating worksheets with names like "All customers", "January customers", "January new customers", "February sales", and so on. The end users use these "helper sheets" as sets to decompose the problem into manageable subproblems.
     
    So, the point … Apart for being possibly implementable as LINQ dialect, the set-based approach proposed by you is more aligned with the way people usually treat query problems.
     
    But I\’m not either saying that it would be ecomically feasible to produce "LINQ for MDX".
     
    Cheers,
    -Harri
     

  3. LINQ to MDX is a tough subject. On one hand MDX is closer to the functional language then most other query languages (i.e. SQL), just see how similar is MDX expression (represented by Server ADOMD.NET\’s Expression object) to the lambda expressions. But on the other hand multidimensional model imposed by the cube doesn\’t fit very well with LINQ. On one hand there is no way to represent implicit aggregates, hierarchical navigation – all of which are very natural in MDX/cube; on the other hand cube sets very strict limitations on what can and cannot be done – if the measure has Sum aggregation function, you cannot do Distinct Count or LastChild on it. And finally, the biggest difficulties with MDX is when it is used as a calculation definition language (i.e. inside MDX Script) as opposed to SELECT queries, which, I dare to say, are trivial in comparison to MDX Script…

  4. A very quick reading of the olap4j spec makes me think it\’s much more like the MD-API of the departed OLAP Council (a project I worked on as an independent consultant about 12 years ago), and what mdXML was working towards when that last had activity. In these models, you can programmatically construct and modify a query model as an object structure, then say "execute". In the case of the MD-API, it was specified in terms of COM and Java objects (I might be able to dig up a copy of the spec somewhere), and it could layer on top of MDX though MDX was/is much, much broader and deeper in capability. In the case of mdXML, the query model is an XML construct, and your favorite DOM object would be a good programmatic interface. Seemed a good idea, and I made a proof-of-concept to parse MDX into mdXML and transform mdXML into MDX 5 years ago. Someday, an object model for MDX will gain traction. I basically agree with Mosha on the way multidimensional doesn\’t mesh with LINQ as-is, but I\’ve put some time in wondering about about what would change with LINQ to be closer. The time may come someday…

  5. Now there is a solution for LINQ to MDX / SSAS problem on the market, checkout SSAS Entity Framework Provider here:
    http://www.agiledesignllc.com/Products.htm

    It’s usage is quite simpler than examples that Chris Webb gave us (though SSAS Entity Framework Provider does not allow server side pivoting yet, and maybe Chris Webb’s code examples might become helpful for that).

Leave a ReplyCancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.