1 Day “Introduction to MDX Course”, Switzerland

Continuing my travels, I can also announce now that I’ll be running a 1-day version of my “Introduction to MDX” course at Microsoft Switzerland’s offices in Wallisellen, near Zurich, on March 16th. Full details and registration can be found here:
http://msevents.microsoft.com/CUI/EventDetail.aspx?EventID=1032437369&culture=de-ch

Despite what it says at the top of the page, the event will be in English – sadly, my German isn’t quite good enough to explain the concepts of MDX yet. And it’ll be nice to go back to the MS offices in Wallisellen: I used to work there, you know…

PASS Germany (Rhein/Main Chapter) Meeting on PowerPivot

I was just thinking the other day how long it’s been since I’ve done a user group meeting in the UK… the reason being, unfortunately, that I’m hardly ever at home except at weekends. Blame the rubbish state of the UK economy, the weak pound and the strong Euro. Anyway, I am pleased to announce that I’ll be speaking at PASS Germany’s Rhein/Main chapter on the 9th of February, doing a session called “Introduction to Self-Service BI with Office 2010 and PowerPivot”. More details can be found here:
http://www.sqlpass.de/Regionalgruppen/RheinMain/tabid/83/Default.aspx

Hope to see some of you…

Business Analysis Tool Desktop

Continuing my occasional series of reviews of SSAS client tools, I recently took a quick look at Business Analysis Tool Desktop from BIT Impulse, a company based in the Ukraine. It’s aimed at the power-user market, which Proclarity desktop used to dominate (and still does, to be honest, despite the fact it died several years ago, but I won’t go off on that rant again), and offers advanced analysis functionality for people who find Excel pivot tables too basic and restrictive.

It makes a good first impression – a nice, modern UI, with a look-and-feel that will be very familiar to users of Proclarity and also Tableau. To start you need to create a ‘workbook’, which contains multiple ‘pages’, which can contain several different types of analysis.

The first page type can contain either a table, a table and a chart, or just a chart. Query building is accomplished by dragging and dropping hierarchies either onto the rows and columns of a pivot table, or onto a ‘shelf’ on the top of the pivot table, and this works very smoothly and intuitively. All of the advanced selection mechanisms you’d expect are present: you can select individual members, entire hierarchies or levels, descendants, and so on.

image

Complex filters of either the Rows or Columns axis, or specific hierarchies that have been selected, can also be built up using one or more conditions; similarly you can sort axes and hierarchies easily too, and do Office 2007-like cell highlighting to create heatmaps. Once the query has been executed, there’s a nice feature whereby you can hide some or all of the real-estate connected to query building such as the lists of dimensions and measures and the ‘shelves’; I also liked the way it was possible to resize rows and columns in the grid to make the layout clearer.

image

The other page types include a treemap:

image

…and a rather cool scatter graph that can be animated to display changes in data over time, and which I spent quite a bit of time playing with:

image

Overall, it’s certainly a strong competitor in its sector and worth checking out if you’re in the market for this type of tool – I liked it. I wouldn’t say it was miles better than any of the other tools like it that I’ve reviewed in the last year, but it’s definitely no worse and has its own particular strengths.

Solving the ‘Events in Progress’ Problem in DAX, Part Two: Aggregating Dates and Multiselect

You may recall that by the end of last week’s post on solving the ‘events in progress’ problem in DAX, I’d managed to get the right values out of PowerPivot at the Date level with good performance. BUT when, for example, I selected a Calendar Year I got no values out, and I hadn’t even thought of how to handle situations such as when Date was on filter and the user had selected three different, non-consecutive dates. Well, that was last week when I was a DAX newbie – this week, well, I’m slightly less of a newbie and I’ve got the solution.

First, let’s restate the problem a bit more clearly. What I want is this: when I have a date on columns or rows, I want my calculated measure to display the number of rows in the FactInternetSales table that have an OrderDate less than or equal to that date, and have a ShipDate that is greater than or equal to that date. This way I see the number of rows in the table where something has been ordered but not yet shipped. When I select a year, I want to see the number of rows where that OrderDate-ShipDate range intersects with the dates in that year. So if I selected the year 2002, I want all the rows where either the OrderDate is in 2002, or the ShipDate is in 2002, or when the OrderDate is before 2002 and the ShipDate is after 2002. Finally if I make a multiselect when Date is on a filter, for example if I multiselect the 1st January 2002, the 5th of January 2002 and the 10th January 2002, I want to count the number of rows where the OrderDate-ShipDate range includes any of those three dates.

I spent several hours thinking how I could do this in DAX using just the columns I had and I came up with the following formula, which gave me the results I was expecting… when it ever returned, because it was horrendously slow:

=CALCULATE(
COUNTROWS(‘FactInternetSales’)
, FILTER(‘FactInternetSales’
, COUNTROWS(
FILTER(
VALUES(‘DimDate'[DateKey]),
(‘FactInternetSales'[OrderDateKey]<=’DimDate'[DateKey])
&&
(‘FactInternetSales'[ShipDateKey]>=’DimDate'[DateKey])
))>0))

It’s using the Calculate function to evaluate the expression CountRows(FactInternetSales); what it’s trying to do in the outermost filter is to filter all of FactInternetSales, and find all the rows where there is at least one Date in the current selection that is between the OrderDateKey and the ShipDateKey. But, as we learned last week, doing a filter on all the columns in a table is very inefficient, and what we actually want to do is to filter on just the columns we’re interested in: OrderDateKey and ShipDateKey. However what I found was that I really needed all of the distinct combinations of OrderDateKey and ShipDateKey, but I could only get either all of the columns in a table using the table name, or the distinct set of values in a single column using the Values() function. Therefore I couldn’t actually filter on the distinct combinations of OrderDateKey and ShipDateKey where OrderDateKey>=DateKey>=ShipDateKey. This seems like a limitation of DAX from what I can see, though it might turn out that there is a way of doing it – if there is I will of course blog about it.

Anyway, the solution was to create a new column that did contain the distinct combinations of OrderDateKey and ShipDateKey. I went into the PowerPivot designer and created a new column called OrderShip on FactInternetSales with the following expression:

=CONCATENATE(‘FactInternetSales'[OrderDateKey],’FactInternetSales'[ShipDateKey])

As you can see, it simply concatenates the OrderDateKey and ShipDateKey values into a 16 character string:

image

I was then able to use that column in a filter as follows, extracting the OrderDateKey and the ShipDateKey from the string using the Left and Right functions, and turning them back into numbers using the Value function:

=Calculate(
CountRows(FactInternetSales)
, FILTER(
VALUES(‘FactInternetSales'[OrderShip])
, COUNTROWS(
FILTER(
VALUES(‘DimDate'[DateKey]),
(‘DimDate'[DateKey]>=VALUE(LEFT(‘FactInternetSales'[OrderShip], 8)))
&&
(‘DimDate'[DateKey]<=VALUE(RIGHT(‘FactInternetSales'[OrderShip],8)))
))>0))

The logic is the same as with the previous expression, with the all-important exception that the outer Filter only works on the distinct values of OrderShip. It works and it’s fast, so I’m relatively happy although I would like to find a more elegant solution.

It not only gives the correct values for individual dates:

image

but also for Years (and other aggregations of Dates – also notice how the sum of the years is greater than the same as the Grand Total, because of course an order can be in progress in two different years and so needs to be counted in each):

image

and also for multiselect (where again the value is not the sum of the values of the three individual dates):

image

image

I think this illustrates one area where DAX is superior to MDX. MDX is notoriously bad at handling multiselect, but as Marius explained to me in DAX multiselect support is built-in: the Values() function is rather like a CurrentSet function, returning the set of all selected (‘existing’ in MDX terms) values in a column in a particular context.

So, another problem solved… time to think up the next challenge!

Solving the ‘Events in Progress’ Problem in DAX

One problem I’ve spent a lot of time thinking about over the last couple of years is what I call the ‘events in progress’ problem. The problem is this: if you have a fact table where each row represents an event, and there are two date columns representing the start and end dates of the event, on any given date how many events were actually in progress (ie how many rows are there where your given date is between the start and the end date)? It’s trivial to solve in SQL, but in SSAS and MDX there’s no truly satisfying solution. Richard Tkatchuk wrote up the best solution for SSAS here, but since it still involves a fair amount of hacking of the data I’ve never felt it was particularly elegant; I’ve always felt like there should be a pure MDX solution, but all of the approaches I’ve tried have been unusably slow. As a result, I was curious to see whether there would be an elegant way of solving this problem in PowerPivot and DAX…

It’s quite easy to use the AdventureWorks DW database to illustrate the problem. The following SQL query returns the number of orders that had been placed but had not yet shipped on any given date:

select DimDate.DateKey
, COUNT(*) as OrdersInProgress
from DimDate,
FactInternetSales
where
DimDate.DateKey>=factinternetsales.OrderDateKey
and
DimDate.DateKey  <= FactInternetSales.ShipDateKey
group by DimDate.DateKey
order by Dimdate.DateKey asc

image

The first thing to do in PowerPivot is, obviously, to import the DimDate and FactInternetSales tables. However the next step is less than obvious: you must not create a relationship between the two tables. Even though PowerPivot suggests you should, you won’t get this solution to work if you do – and as a result, in my opinion, it would be great if this message could be turned off (vote here if you agree).

image

Next we need to create a new calculated measure in the FactInternetSales table, and we need to come up with a DAX expression for it. Even though the logic is fairly simple, I struggled at first because the Calculate statement doesn’t allow you to use the Min or Max functions directly in one of its filter parameters. Then, though, I realised that the same limitation doesn’t apply to the Filter function and I came up with the following, which worked:

=COUNTROWS(
Filter(
Filter(‘FactInternetSales’, ‘FactInternetSales'[OrderDateKey]<=MIN(‘DimDate'[DateKey])), ‘FactInternetSales'[ShipDateKey]>=MAX(‘DimDate'[DateKey])))

Unfortunately, although it was faster than any of my MDX approaches in SSAS, it was still not really fast enough. So I went back to my friends at Microsoft, and Marius Dumitru (again) came up with the following which is really fast:

=Calculate(
CountRows(FactInternetSales),
Filter(Values(FactInternetSales[OrderDateKey]), Min(DimDate[DateKey])>=[OrderDateKey]),
Filter(Values(FactInternetSales[ShipDateKey]), Max(DimDate[DateKey])<=[ShipDateKey]))

image

The next problem is that this only works when something is explicitly selected from DateKey. When you put Calendar Year on rows or columns though, for example, it returns nothing. I’m currently working on getting this fixed, but that will be another blog post!

Anyway couple of questions need to be asked at this point. First of all, how/why does this work? Well Marco has just posted a great explanation of the Calculate statement here that I recommend you to read and reread; it covers the theory behind this expression and a lot more.

Secondly why is this second expression so much faster than the first? Here’s what Marius told me: in the first expression the filtering is done on each individual row in the fact table. In the second expression the filters act on the distinct values from both OrderDateKey and ShipDateKey, and since the number of distinct values is smaller than the overall number of rows this is evaluated very quickly; the results of these two filters can be then used in a further, very efficient request to the PowePivot/Vertipaq engine. In Marius’ own words, the takeaway is that “it’s always a best practice to avoid filtering an entire table if the filter condition involves only some of the table columns (i.e. when no key/unique columns are used in the condition expression). The way to go is to only filter the columns involved in the condition (using Filter over All, Values or Distinct applied to the desired columns), and use those filter results as parameters to Calculate to carry out the rest of the computation”. Note that this isn’t a matter of block computation vs cell-by-cell because all calculations in DAX are evaluated in block mode.

So there you go, the first DAX optimisation tip on this blog!

SQL Academy event in Dublin

Just a quick note to mention I’ll be doing a half-day seminar on various intermediate-level SSAS topics in Dublin on February 16th. It’s part of Microsoft Ireland’s SQL Academy series, and you can find the agenda and register (it’s free to attend) here:

http://msevents.microsoft.com/CUI/EventDetail.aspx?EventID=1032428016&Culture=en-IE

Fifth Blog Birthday

So my blog birthday has come around again, and five years seems like some kind of milestone – I’m surprised anything in IT lasts five years. But here I am still going…

It’s been a pretty good year. Half of it was taken up with writing “Expert Cube Development with Analysis Services 2008” with Marco and Alberto, which was published in the summer; I was very happy with how it turned out, and we got some really good reviews on various blogs (see here for a list) as well as six five-star reviews on Amazon.com. Even the one bad review we got on Amazon, in my opinion, proves that we achieved our objective to write a book specifically for more advanced users rather than beginners. We heard the other day that we’ve already sold more than a thousand copies which I don’t think is bad at all for a book with a relatively small target readership.

The blog itself did quite well too: I was very proud when won an award for best BI blog post in the PASS Log Reader Awards for my post on SSRS Drilldown, and I was equally chuffed when Donald Farmer named me in his top ten SQL bloggers of 2009 the other day. It’s been harder to find SSAS and MDX-related issues to write about because, let’s face it, there hasn’t been much new SSAS and MDX functionality in the past few years, but I’ve got a lot of PowerPivot and DAX posts planned and there’s plenty of new stuff in the wider MS BI stack that will be worth a look. So far I’ve resisted the temptation to move with the times and start Twittering as well, not because I have anything against it but more because I’m sure I’d like it too much, and I spend far too much time tapping away at my laptop already.

Business-wise things have certainly been slower in 2009 than they were in 2008 or 2007, which was only to be expected. Luckily the quietest months for work were also the months when I was working hardest on the book, and it now looks like things have turned a corner (for the European market, if not the UK) and I’m extremely busy once again. I’m not sure whether PowerPivot and DAX will present many opportunities for me to make money from consultancy, but given that I’ve been doing a lot more training recently I’ll probably set aside some time to write a course on them too; I’ll be on the lookout for other opportunities to diversify away from my core expertise of Analysis Services and MDX, but I don’t think I’ve found a new technology that really grabs me yet.

Last of all, I couldn’t let Mosha’s ‘farewell to BI’ blog post from earlier today pass without comment. I started working with OLAP Services as it then was during the beta for SQL Server 7.0, more than ten years ago now, and right from the beginning Mosha made an incredible effort to engage with the user community, giving us advice, answering our questions and taking the time to understand what we were trying to do with the product. His posts on the microsoft.public.sqlserver.olap newsgroup, and latterly on his blog, were a goldmine of information and I firmly believe his efforts were one of the main reasons why Analysis Services has been as successful as it has been; his example is one other development teams at Microsoft would do well to follow. Of course it’s not like he’s died or anything and I don’t want to sound as though I’m writing his obituary, but he’ll be missed and Bing’s gain is our loss.

Here’s to 2010!

Microsoft BI Conference 2010

Vidas just pointed out the following blog entry to me, about the Microsoft BI Conference 2010 being ‘co-located’ with TechEd North America:
http://blogs.msdn.com/usisvde/archive/2009/12/28/mark-your-calendars-for-microsoft-bi-conference.aspx

That means it’ll be taking place in June in New Orleans. I don’t think anyone was expecting it to reappear after its break last year so I’m pleasantly surprised, and the decision to run it parallel with TechEd is a good one in my opinion. Will I go though? I’d like to, but I’m already travelling way too much for work and I’m not sure I can justify the expense and yet another week away from my family…

Binding an Excel table to the results of an MDX query

I like using Excel tables, and one of the things that has mildly annoyed me in the past is that you can’t bind the results of an MDX query to an Excel table in the way you can do with a SQL query. I mean, pivot tables are all very well, but sometimes a plain old table is a better fit – for example, think of the cool stuff the data mining addin works can do with tables. Luckily, with a bit of hacking, you can do it… and here’s how.

The trick is to create a .odc file that contains the MDX query you want. What I did was create an odc file for a SQL query and then edit it in Notepad so the connection string pointed to SSAS and the SQL query was replaced with an MDX query. To do this, open Excel and go to the Data tab, click From Other Data Sources and then choose From Data Connection Wizard. Choose Other/Advanced on the first step, then create a connection to the OLEDB source of your choice so long as it isn’t an SSAS source. The resulting odc file will then be saved to the My Data Sources; go there and edit it in Notepad.

The hacking is fairly easy to do – a quick inspection of the odc file format reveals that there’s a lot of junk and the important stuff is contained in a bit of XML near the beginning. Here’s an example of what that XML needs to be for an MDX query:

   1: <xml id=docprops><o:DocumentProperties
   2:   xmlns:o="urn:schemas-microsoft-com:office:office"
   3:   xmlns="http://www.w3.org/TR/REC-html40">
   4:   <o:Name>SSAS Query Test</o:Name>
   5:  </o:DocumentProperties>
   6: </xml><xml id=msodc><odc:OfficeDataConnection
   7:   xmlns:odc="urn:schemas-microsoft-com:office:odc"
   8:   xmlns="http://www.w3.org/TR/REC-html40">
   9:   <odc:Connection odc:Type="OLEDB">
  10:    <odc:ConnectionString>Provider=MSOLAP.4;Integrated Security=SSPI;
  11:     Persist Security Info=True;Data Source=localhost;
  12:     Initial Catalog=Adventure Works DW 2008</odc:ConnectionString>
  13:    <odc:CommandType>MDX</odc:CommandType>
  14:    <odc:CommandText>select {[Measures].[Internet Sales Amount], 
  15:     [Measures].[Internet Tax Amount]} on  0, 
  16:     [Date].[Calendar Year].members on 1 from [Adventure Works]
  17:     </odc:CommandText>
  18:   </odc:Connection>
  19:  </odc:OfficeDataConnection>
  20: </xml>

Once you’ve done this, you can go back to Excel, go to the Data tab and click Existing Connections to open the file:

image 

Choose to view this data in a table, and you get something that looks like this in your worksheet:

image

Incidentally, since you can query Analysis Services with a basic dialect of SQL, you can also bind an entire dimension or measure group (which are treated as ‘tables’ in SSAS SQL) or the results of a DMV to a table. Here’s an example of what the xml for the odc file looks like:

   1: <xml id=docprops><o:DocumentProperties
   2:   xmlns:o="urn:schemas-microsoft-com:office:office"
   3:   xmlns="http://www.w3.org/TR/REC-html40">
   4:   <o:Name>SSAS Table Test</o:Name>
   5:  </o:DocumentProperties>
   6: </xml><xml id=msodc><odc:OfficeDataConnection
   7:   xmlns:odc="urn:schemas-microsoft-com:office:odc"
   8:   xmlns="http://www.w3.org/TR/REC-html40">
   9:   <odc:Connection odc:Type="OLEDB">
  10:    <odc:ConnectionString>Provider=MSOLAP.4;Integrated Security=SSPI;
  11:     Persist Security Info=True;Data Source=localhost;
  12:     Initial Catalog=Adventure Works DW 2008</odc:ConnectionString>
  13:    <odc:CommandType>Table</odc:CommandType>
  14:    <odc:CommandText>Adventure Works.$Source Currency</odc:CommandText>
  15:   </odc:Connection>
  16:  </odc:OfficeDataConnection>
  17: </xml>

In this example I’m retrieving the entire contents of the Adventure Works Source Currency dimension. The output looks like this:

image

Since we’re all going to be doing a lot more reporting in Excel in the future, hopefully this tip will turn out useful to someone creating dashboards in Excel using SSAS data.

UPDATE: Greg Galloway just told me about another way of doing this: "If you have an ODC file you build a PivotTable, then if you drillthrough on a cell, it brings up a QueryTable with the drillthrough results. At that point, you can right click on the QueryTable and edit the query. It’s not elegant, but at least you don’t have to create an ODC file per query."

SQLDay09 Belgium Videos Online

I recently had the pleasure of going to Belgium for the Belgian SQL Server User Group’s annual conference, and I’ve just heard that the videos for the sessions have now been posted online. You can see a complete list here:

http://sqlug.be/blogs/official_sqlugbe_blog/archive/2009/12/12/sql-server-day-2009-videos-now-available-on-chopsticks.aspx

I did two sessions: "Designing Effective Aggregations in Analysis Services 2008” and “Introduction to PowerPivot”. The aggregations session was the same one I did at PASS Europe and PASS USA and even though it was the third time I’d done it I still managed to mess up the timing and had to skip over the last few slides in double-quick time, hohum. Luckily they’re not all that important and I managed to cover the main points. Here’s that presentation:
http://www.microsoft.com/belux/MSDN/nl/chopsticks/default.aspx?id=1499

The PowerPivot session was the closing keynote, and as well as do all the normal demos it gave me the opportunity to vent my positive and negative emotions about it (I’d like to stress that I was in general positive). I suspect I might have gone on a bit too long about that side of things initially so apologies for that… Here’s the link:
http://www.microsoft.com/belux/MSDN/nl/chopsticks/default.aspx?id=1503

With a bit of luck the videos for SQLBits will be online soon too.