R and F#

One of my new year’s resolutions – or at least, something that got added to my list of stuff to do in the unlikely event I’ve got some time spare and can be bothered – was to learn more about statistics. I’ve only got a very basic grasp of the subject but, like data mining, it’s one of those things that seems to promise to be incredibly useful in my line of work. However it’s interesting to ponder that I’ve been working in BI for almost a decade and never so far needed to learn much beyond basic stats; my theory is that stats, like data mining, only tends to be used by highly skilled quantitative analysts, whereas the people I work with are business people whose maths skills are very average and who quite rightly don’t trust analysis done using methods they can’t understand.

Anyway, in my browsing on the subject I came across the all-of-a-sudden popular topic of R (see http://www.r-project.org/), the statistical programming language. I thought it might make an interesting blog entry, but today I saw John Brookmyre beat me too it so I’ll just link to him instead:
http://blogs.conchango.com/johnbrookmyre/archive/2009/01/14/are-you-r-ing-yet.aspx

I also got interested in learning about F#, the functional programming language that will be included in VS2010 (for a good overview, see http://www.developer.com/net/net/article.php/3784961). I was struck by some similarities with MDX and began to wonder about how it could be applied to BI; and yet again, a quick Google revealed Aaron Erickson had had the same idea and blogged extensively and intelligently on the subject:
http://blog.magenic.com/blogs/aarone/archive/2008/04/23/On-Business-Intelligence-and-F_2300_.aspx
http://blog.magenic.com/blogs/aarone/archive/2008/09/07/F_2300_-Business-Intelligence-Case-Study-_2D00_-XBox-Live-Trueskill.aspx
http://blog.magenic.com/blogs/aarone/archive/2008/12/11/One-step-closer-to-F_2300_-for-Business-Intelligence.aspx

It’ll be interesting to watch the uptake of F# in BI; from what I can see there’s already a lot of activity in the area of data manipulation and stats for F# (see for example Luca Bolognese’s blog) and I’m sure it’s only going to grow. The only complaint I’ve got is that here’s yet another addition to the Microsoft BI toolset and I’m yet to be convinced there’s any kind of company-wide strategy aimed at shaping all these tools into a coherent BI strategy. F# won’t be the language of BI in the way that Aaron wants; it’s more likely to end up as a technology island in the way Aaron specifically doesn’t want. But hey, the .NET guys have arrived at the party! The more the merrier.

6 thoughts on “R and F#

  1. There\’s another great language that can be good for statistical programming – Matlab, of course.Anyway, I really can\’t see how these languages can replace Hierarchy-based language like MDX. How will you treat Siblings, Ancestors, etc.? I mean you can do it with every language, but MDX is native in this subject. Another point – what about aggregations management? How will you save and manage pre-built aggregations without cube storage?I think that F#, R and Matlab are good languages, but not for BI.

  2. It depends on what you mean by \’BI\’ of course – not everyone wants or needs a cube or aggregations or anything like that. So maybe these languages won\’t be useful for the kind of BI you and I do, but they\’ll be useful for other kinds of BI.

  3. Heh – watch out – the BI guys who stand at the gates of this topic with their pitchforks, cubes, MDX, and talk of storage and the need to couple business logic with their favorite database product might come after you :)… I kid, I kid…Seriously, having independent BI – that is, moving BI away from the realm of storage, is strangely threatening to some people. Lets put it this way, the ability to aggregate things is exactly what FP does (in Linq with C#, we have a method named "Aggregate" – funny enough – we call it reduce in F#, but the whole point is to aggregate data using lambda calculus). To me, this is really about functional languages not tied to DB products (F#, Haskell, etc.) vs those that are (MDX, SQL).

  4. Let me take a bit of a provocative position though, for a moment: surely when you\’re working with data you can\’t escape from storage? F# isn\’t tied to a database, yes, but when you work with data in F# you\’re storing that data in .NET\’s own data structures which are nowhere near as scalable or efficient as a full database product. So you have the choice of working with something like F# and having a rich language but less good storage options, or with a database product that does storage well but has much more limited language options.

  5. I think if anything, the XBox Live Trueskill implementation, which computes metrics on an ongoing basis over multiple terabytes of player data, demonstrates that storage is not as big of a concern as some might think.That said, there is no reason why F# could not leverage SQL Server where appropriate – especially via something like mapping F# set operators over a good IQueryable implementation in SQL Server (i.e. L2S, or something like it optimized for BI).

  6. Actually, I was thinking that Gemini would be a good candidate for F#: memory-based, super-fast, based on Analysis Services local cubes. If they can integrate it with Excel, why not .NET?

Leave a Reply