PASS European Conference and the Microsoft Architect Conference

Two new speaking engagements to mention: firstly, I’m happy to announce I’ll be doing two sessions at the PASS European Conference, which is taking place on April 21st-23rd in Dusseldorf, on DAX and SSAS cache-warming. You can view the whole agenda here:

I’m also going to be speaking on PowerPivot and how it fits into the wider corporate BI picture at the Microsoft Architect Insight Conference in London on the 31st of March. You can see the agenda for that event here:

As always, say hello if you see me…

Connecting to SQL Azure from Excel 2007

Sometimes I have an idea, spend a bit of time testing it out, and it ends up being a dead end. That’s what happened this evening but there’s at least one interesting bit of info that resulted so I thought I’d blog anyway…

My original thought was this:
* Excel 2007 can connect to SQL Server to retrieve data for use in reports in a worksheet
* SQL Azure is SQL Server in the cloud
* Office Web Apps gives us the ability to view Excel spreadsheets in the cloud
* So can I create an Excel spreadsheet that connects to SQL Azure, build a report using data from there, upload it to the Excel Web App and then refresh the connection so that my spreadsheet in the cloud displays live data from the cloud?

The short answer is no, at least not at the moment. But after a bit of trial-and-error I did get halfway there and manage to hook Excel 2007 up to SQL Azure (nb I’m not talking about using PowerPivot, which does work with SQL Azure, but the built-in Excel functionality). Here’s how:

  • None of the built-in functionality for connecting to SQL Server from Excel 2007 seems to work. However it is possible to connect to ODBC data sources from Excel and SQL Azure can be exposed as an ODBC data source.
  • So, in Excel, go to the Data tab and click on the “From Other Data Sources” and “From Data Connection Wizard”.
  • Select “Other/Advanced” and then the Microsoft OLE DB Provider for ODBC Data Sources and then click Next
  • Select the ‘Use Connection String’ option and paste the ODBC connection string that SQL Azure generates for you into the box.
  • Paste the value of the Uid property in the User name box, and put your password in the Password box. Delete the Uid and Pwd properties from the connection string.
  • Type the name of your database into the Initial Catalog box, then click OK
  • Finally a list of tables in your SQL Azure database appears; choose one, create an Excel data source and then create either a table or pivot table from the data. Click OK, enter your password one last time, and bingo!

Unfortunately, as I said, when I uploaded the resulting spreadsheet to the Excel Web App, I got the following error message:


Shame – I can understand why it makes sense for most external data connections not to be supported, but in this case, when the external data you’re connecting to is also in the cloud, it would be nice if an exception could be made.

One day, though, I’m sure a scenario like this will work. When I think about what Microsoft’s story for cloud BI might be like, the Excel Web App is the obvious candidate for the reporting tool. Whether you’re reporting direct from relational data stored in SQL Azure, or from some kind of cube (PowerPivot in the cloud is another obvious direction), Excel is going to be the easiest way to do it for the largest number of people. I do see a role for some kind of SSRS in the cloud too, but even in the Microsoft BI stack at the moment there’s a lot of overlap between SSRS and Excel/Excel Services for reporting; I wonder if this will be rationalised at some point? For example Report Builder has never really caught on as a way of letting end-users build their own reports, so why not forget it, develop Excel for this purpose and somehow extend SSRS’s rich functionality for managing and scheduling reports to work with Excel-based reports? Just a thought.

Cell Security and Calculated Members Defined in the WITH Clause or Session

I was asked an interesting question today about cell security – how can you get it to work with calculated members defined in the WITH clause or the session? If, for example, you create a role on the Adventure Works database with the following expression in the Read Permissions box for cell security:

[Geography].[Geography].currentmember is [Geography].[Geography].[Country].&[Australia]
( [Measures].currentmember IS [Measures].[Reseller Sales Amount]
[Measures].currentmember IS [Measures].[Reseller Order Count])

If you then run the following query when connecting via the role:

[Measures].[Reseller Sales Amount] * [Measures].[Reseller Order Count]
{MEASURES.TEST, [Measures].[Reseller Sales Amount], [Measures].[Reseller Order Count], [Measures].[Reseller Gross Profit]}
ON 0,
ON 1
FROM [Adventure Works]

You’ll see the the following results:


This is pretty much what you’d expect – you only see values for Australia for Reseller Sales Amount and Reseller Order Count. The problem here is how you can grant access so the user can see the measure TEST which has been defined in the WITH clause. If you try the following expression in the role:

[Geography].[Geography].currentmember is [Geography].[Geography].[Country].&[Australia]
( [Measures].currentmember IS [Measures].[Reseller Sales Amount] OR [Measures].currentmember IS [Measures].[Reseller Order Count] OR [Measures].currentmember IS [Measures].TEST)

You see the following error in a few cells in the query above:
#Error CellPermission (3, 162) The member ‘[TEST]’ was not found in the cube when the string, [Measures].[TEST], was parsed.

And in any case, how can you know in advance what the name of the calculated measures you want to grant access to are going to be anyway?

What we actually want to do is retain control of all measures, calculated or otherwise, that are defined on the cube, but allow access to any calculated measures defined in the WITH clause or the session. The way to do it is to rely on the fact that cell security is evaluated after the MDX Script and do the following. First define a named set on the cube at the end of the MDX Script something like this:


This set contains all of the measures, including calculated measures, that have been created on the MDX Script – except those that have their Visible property set to False (these you’d have to add to the set manually).

Then use an expression such as this in your role:

[Geography].[Geography].currentmember is [Geography].[Geography].[Country].&[Australia]
( [Measures].currentmember IS [Measures].[Reseller Sales Amount] OR [Measures].currentmember IS [Measures].[Reseller Order Count] OR

The last line is the new part: it checks to see whether the currentmember on Measures is included in the set we’ve just defined. If it is, we can be sure it’s defined on the cube. If it isn’t, it must be defined either in the WITH clause or in the session so we can grant access to it. If you then run the query again, you get the following results:


As you can see, we can now see the value for TEST. Of course you also need to be extremely careful you’re not using a security hole here by doing this, test thoroughly and use Read Contingent permissions as appropriate.

1 Day “Introduction to MDX Course”, Switzerland

Continuing my travels, I can also announce now that I’ll be running a 1-day version of my “Introduction to MDX” course at Microsoft Switzerland’s offices in Wallisellen, near Zurich, on March 16th. Full details and registration can be found here:

Despite what it says at the top of the page, the event will be in English – sadly, my German isn’t quite good enough to explain the concepts of MDX yet. And it’ll be nice to go back to the MS offices in Wallisellen: I used to work there, you know…

PASS Germany (Rhein/Main Chapter) Meeting on PowerPivot

I was just thinking the other day how long it’s been since I’ve done a user group meeting in the UK… the reason being, unfortunately, that I’m hardly ever at home except at weekends. Blame the rubbish state of the UK economy, the weak pound and the strong Euro. Anyway, I am pleased to announce that I’ll be speaking at PASS Germany’s Rhein/Main chapter on the 9th of February, doing a session called “Introduction to Self-Service BI with Office 2010 and PowerPivot”. More details can be found here:

Hope to see some of you…

Business Analysis Tool Desktop

Continuing my occasional series of reviews of SSAS client tools, I recently took a quick look at Business Analysis Tool Desktop from BIT Impulse, a company based in the Ukraine. It’s aimed at the power-user market, which Proclarity desktop used to dominate (and still does, to be honest, despite the fact it died several years ago, but I won’t go off on that rant again), and offers advanced analysis functionality for people who find Excel pivot tables too basic and restrictive.

It makes a good first impression – a nice, modern UI, with a look-and-feel that will be very familiar to users of Proclarity and also Tableau. To start you need to create a ‘workbook’, which contains multiple ‘pages’, which can contain several different types of analysis.

The first page type can contain either a table, a table and a chart, or just a chart. Query building is accomplished by dragging and dropping hierarchies either onto the rows and columns of a pivot table, or onto a ‘shelf’ on the top of the pivot table, and this works very smoothly and intuitively. All of the advanced selection mechanisms you’d expect are present: you can select individual members, entire hierarchies or levels, descendants, and so on.


Complex filters of either the Rows or Columns axis, or specific hierarchies that have been selected, can also be built up using one or more conditions; similarly you can sort axes and hierarchies easily too, and do Office 2007-like cell highlighting to create heatmaps. Once the query has been executed, there’s a nice feature whereby you can hide some or all of the real-estate connected to query building such as the lists of dimensions and measures and the ‘shelves’; I also liked the way it was possible to resize rows and columns in the grid to make the layout clearer.


The other page types include a treemap:


…and a rather cool scatter graph that can be animated to display changes in data over time, and which I spent quite a bit of time playing with:


Overall, it’s certainly a strong competitor in its sector and worth checking out if you’re in the market for this type of tool – I liked it. I wouldn’t say it was miles better than any of the other tools like it that I’ve reviewed in the last year, but it’s definitely no worse and has its own particular strengths.

Solving the ‘Events in Progress’ Problem in DAX, Part Two: Aggregating Dates and Multiselect

You may recall that by the end of last week’s post on solving the ‘events in progress’ problem in DAX, I’d managed to get the right values out of PowerPivot at the Date level with good performance. BUT when, for example, I selected a Calendar Year I got no values out, and I hadn’t even thought of how to handle situations such as when Date was on filter and the user had selected three different, non-consecutive dates. Well, that was last week when I was a DAX newbie – this week, well, I’m slightly less of a newbie and I’ve got the solution.

First, let’s restate the problem a bit more clearly. What I want is this: when I have a date on columns or rows, I want my calculated measure to display the number of rows in the FactInternetSales table that have an OrderDate less than or equal to that date, and have a ShipDate that is greater than or equal to that date. This way I see the number of rows in the table where something has been ordered but not yet shipped. When I select a year, I want to see the number of rows where that OrderDate-ShipDate range intersects with the dates in that year. So if I selected the year 2002, I want all the rows where either the OrderDate is in 2002, or the ShipDate is in 2002, or when the OrderDate is before 2002 and the ShipDate is after 2002. Finally if I make a multiselect when Date is on a filter, for example if I multiselect the 1st January 2002, the 5th of January 2002 and the 10th January 2002, I want to count the number of rows where the OrderDate-ShipDate range includes any of those three dates.

I spent several hours thinking how I could do this in DAX using just the columns I had and I came up with the following formula, which gave me the results I was expecting… when it ever returned, because it was horrendously slow:

, FILTER(‘FactInternetSales’

It’s using the Calculate function to evaluate the expression CountRows(FactInternetSales); what it’s trying to do in the outermost filter is to filter all of FactInternetSales, and find all the rows where there is at least one Date in the current selection that is between the OrderDateKey and the ShipDateKey. But, as we learned last week, doing a filter on all the columns in a table is very inefficient, and what we actually want to do is to filter on just the columns we’re interested in: OrderDateKey and ShipDateKey. However what I found was that I really needed all of the distinct combinations of OrderDateKey and ShipDateKey, but I could only get either all of the columns in a table using the table name, or the distinct set of values in a single column using the Values() function. Therefore I couldn’t actually filter on the distinct combinations of OrderDateKey and ShipDateKey where OrderDateKey>=DateKey>=ShipDateKey. This seems like a limitation of DAX from what I can see, though it might turn out that there is a way of doing it – if there is I will of course blog about it.

Anyway, the solution was to create a new column that did contain the distinct combinations of OrderDateKey and ShipDateKey. I went into the PowerPivot designer and created a new column called OrderShip on FactInternetSales with the following expression:


As you can see, it simply concatenates the OrderDateKey and ShipDateKey values into a 16 character string:


I was then able to use that column in a filter as follows, extracting the OrderDateKey and the ShipDateKey from the string using the Left and Right functions, and turning them back into numbers using the Value function:

(‘DimDate'[DateKey]>=VALUE(LEFT(‘FactInternetSales'[OrderShip], 8)))

The logic is the same as with the previous expression, with the all-important exception that the outer Filter only works on the distinct values of OrderShip. It works and it’s fast, so I’m relatively happy although I would like to find a more elegant solution.

It not only gives the correct values for individual dates:


but also for Years (and other aggregations of Dates – also notice how the sum of the years is greater than the same as the Grand Total, because of course an order can be in progress in two different years and so needs to be counted in each):


and also for multiselect (where again the value is not the sum of the values of the three individual dates):



I think this illustrates one area where DAX is superior to MDX. MDX is notoriously bad at handling multiselect, but as Marius explained to me in DAX multiselect support is built-in: the Values() function is rather like a CurrentSet function, returning the set of all selected (‘existing’ in MDX terms) values in a column in a particular context.

So, another problem solved… time to think up the next challenge!

Solving the ‘Events in Progress’ Problem in DAX

One problem I’ve spent a lot of time thinking about over the last couple of years is what I call the ‘events in progress’ problem. The problem is this: if you have a fact table where each row represents an event, and there are two date columns representing the start and end dates of the event, on any given date how many events were actually in progress (ie how many rows are there where your given date is between the start and the end date)? It’s trivial to solve in SQL, but in SSAS and MDX there’s no truly satisfying solution. Richard Tkatchuk wrote up the best solution for SSAS here, but since it still involves a fair amount of hacking of the data I’ve never felt it was particularly elegant; I’ve always felt like there should be a pure MDX solution, but all of the approaches I’ve tried have been unusably slow. As a result, I was curious to see whether there would be an elegant way of solving this problem in PowerPivot and DAX…

It’s quite easy to use the AdventureWorks DW database to illustrate the problem. The following SQL query returns the number of orders that had been placed but had not yet shipped on any given date:

select DimDate.DateKey
, COUNT(*) as OrdersInProgress
from DimDate,
DimDate.DateKey  <= FactInternetSales.ShipDateKey
group by DimDate.DateKey
order by Dimdate.DateKey asc


The first thing to do in PowerPivot is, obviously, to import the DimDate and FactInternetSales tables. However the next step is less than obvious: you must not create a relationship between the two tables. Even though PowerPivot suggests you should, you won’t get this solution to work if you do – and as a result, in my opinion, it would be great if this message could be turned off (vote here if you agree).


Next we need to create a new calculated measure in the FactInternetSales table, and we need to come up with a DAX expression for it. Even though the logic is fairly simple, I struggled at first because the Calculate statement doesn’t allow you to use the Min or Max functions directly in one of its filter parameters. Then, though, I realised that the same limitation doesn’t apply to the Filter function and I came up with the following, which worked:

Filter(‘FactInternetSales’, ‘FactInternetSales'[OrderDateKey]<=MIN(‘DimDate'[DateKey])), ‘FactInternetSales'[ShipDateKey]>=MAX(‘DimDate'[DateKey])))

Unfortunately, although it was faster than any of my MDX approaches in SSAS, it was still not really fast enough. So I went back to my friends at Microsoft, and Marius Dumitru (again) came up with the following which is really fast:

Filter(Values(FactInternetSales[OrderDateKey]), Min(DimDate[DateKey])>=[OrderDateKey]),
Filter(Values(FactInternetSales[ShipDateKey]), Max(DimDate[DateKey])<=[ShipDateKey]))


The next problem is that this only works when something is explicitly selected from DateKey. When you put Calendar Year on rows or columns though, for example, it returns nothing. I’m currently working on getting this fixed, but that will be another blog post!

Anyway couple of questions need to be asked at this point. First of all, how/why does this work? Well Marco has just posted a great explanation of the Calculate statement here that I recommend you to read and reread; it covers the theory behind this expression and a lot more.

Secondly why is this second expression so much faster than the first? Here’s what Marius told me: in the first expression the filtering is done on each individual row in the fact table. In the second expression the filters act on the distinct values from both OrderDateKey and ShipDateKey, and since the number of distinct values is smaller than the overall number of rows this is evaluated very quickly; the results of these two filters can be then used in a further, very efficient request to the PowePivot/Vertipaq engine. In Marius’ own words, the takeaway is that “it’s always a best practice to avoid filtering an entire table if the filter condition involves only some of the table columns (i.e. when no key/unique columns are used in the condition expression). The way to go is to only filter the columns involved in the condition (using Filter over All, Values or Distinct applied to the desired columns), and use those filter results as parameters to Calculate to carry out the rest of the computation”. Note that this isn’t a matter of block computation vs cell-by-cell because all calculations in DAX are evaluated in block mode.

So there you go, the first DAX optimisation tip on this blog!

%d bloggers like this: