Defining DAX Measures In The With Clause Of An MDX Query

It’s a little-known fact (but certainly not completely unknown – it was mentioned in Marco, Alberto and my SSAS Tabular book I think) that you can define measures using DAX in the WITH clause of an MDX query. This means you can write queries like the following against an SSAS Tabular model:

with
measure ‘Date'[Demo Calc] =
countrows(‘Date’)

select {measures.[Demo Calc]} on 0,
[Date].[Calendar Year].members on 1
from [Model]

image

The official documentation, such as it is, is here:
http://msdn.microsoft.com/en-us/library/hh758441.aspx

Unfortunately you can’t use it from Excel 2013 using the new ‘create calculated measure’ functionality; I also talked to the nice people behind OLAP PivotTable Extensions and there are some very good reasons why they can’t support this either.

What use is this then? You’re only going to be able to use it in scenarios where you control the generation of the MDX on the client side, such as SSRS reports, which may not be all that often; in fact, in these situations you might be better off writing the whole query in DAX. It’s only going to be useful when you need the power of MDX and DAX in the same query. For example, you might want to take advantage of DAX’s superior ability to detect multiselects, but write all your other calculations in MDX. I’m clutching at straws here though! Still, it’s an interesting thing to know about. Please leave a comment if you can thing of a situation where you can use it…

Some Thoughts About Power BI

By now you’ll probably have seen the Power BI announcement from Microsoft. It’s an important one, and if you haven’t seen it I suggest you take a look at the official announcements and website here:

http://blogs.office.com/b/office-news/archive/2013/07/08/announcing-power-bi-for-office-365.aspx
http://blogs.office.com/b/office365tech/archive/2013/07/07/what-powers-power-bi-in-office-365.aspx
http://blogs.technet.com/b/dataplatforminsider/archive/2013/07/08/introducing-power-bi-for-office-365.aspx?WT.mc_id=Social_TW_OutgoingEvents_20130708_25941_SQLServer
http://office.microsoft.com/en-us/excel/power-bi-FX104080667.aspx
http://blogs.msdn.com/b/powerbi/archive/2013/07/07/getting-started-with-pq-and-pm.aspx

Andrew Brust also has a good summary of the news here:
http://www.zdnet.com/microsoft-announces-power-bi-for-office-365-7000017746/

There’s no point me repeating what’s already been said, so I thought I’d post my initial reaction to it:

  • Proper Mobile BI! HTML 5 Power View! It works on iPads too! Hurray! At-bloody-last! The mobile BI solution will allow you to find, interact with and share “Excel and Power View content”, so I guess that includes Excel worksheets (with PivotTables, slicers etc) as well as Power View reports (a close look at the screenshots in the blog posts above back this up).
  • Making Power BI available only via Office 365 is going to be a very controversial strategy in the MS BI partner community. To be clear: as far as I understand it, Data Explorer (now Power Query), GeoFlow (now Power Map), mobile BI and all of the cool stuff that’s just been announced is only going to be available through Power BI, and therefore Office 365. Unfortunately the biggest companies, with the biggest BI budgets, are often the ones who are slowest to upgrade to the latest versions of Office and a lot of cases this won’t change just because someone wants to see their reports on an iPad. Where IT department inertia, worries about data privacy and company politics mean that Office 365 is not an option, Microsoft will lose out to the pure-play BI vendors who offer standalone solutions.
  • If you look at this from a different point of view, though, some of the things that I (as a BI Pro) feel least comfortable about in Microsoft’s BI strategy are also its greatest strengths. The way I see it, MS is not treating self-service BI as a solution in its own right, but selling self-service BI as a feature of Office. This makes a lot of sense from Microsoft’s point of view – it’s building on the fact that Excel is the tool of choice for data analysis for 99% of users. What I think is going to happen as a result of this is that, rather than partners selling BI as a standalone solution as in the past, we’re going to be talking to people who have already got Office 2013 or Office 365 and are looking to make the most of the BI features that come as part of that. The MS BI partner community is going to have to adjust to this because I doubt MS is going to change this strategy soon.
  • The same can be said of the Office 365-only strategy. If Microsoft is going to be successful with its cloud-first strategy then it’s going to have to prioritise cloud functionality over on-prem functionality. I think MS is doing the right thing with its cloud-first strategy, so therefore, even though I’m going to find it painful when I have to deal with customers that won’t or can’t move to Office 365, I can understand why MS is making Power BI Office 365 only. Beyond the hype (MS says that 1 in 4 of its customers is already on Office 365), it does seem like the uptake of Office 365 is quite strong, especially in SMBs, so hopefully there will be a large potential customer base.
  • I’ve been presenting sessions on the Excel 2013 BI stack at various user groups and conferences over the last few months and it’s gone down very well indeed. A lot of people have come up to me after seeing my session and said that they had been looking at QlikView and Tableau for BI, and would now consider Office 2013 as another alternative. The BI functionality on its own is pretty good, and good enough for a lot of customers even if it isn’t as mature as some of the competing offerings; the fact that the BI functionality comes baked into Office is the killer. While it may be expensive to upgrade to Office 2013/Office 365 this is a cost that many businesses will be considering anyway regardless of their BI requirements; you also have to compare this with the cost of QlikView and Tableau licenses and remember that not every user will need the most expensive SKUs of Office 2013.
  • The ability to refresh data in Excel workbooks deployed to Sharepoint Online, even when the data sources are on-prem, is a key feature and one that I’ve been waiting for. I wonder what the performance will be like?
  • For anyone of a certain age, the first reaction to the news of the natural language querying capability is two words: English Query. I haven’t played with it so I can’t pass judgement, but it’s going to have to be pretty impressive if anyone is going to use it for more than just sales demos. We shall see…
  • I am quite curious about the enterprise data search capabilities. Leaving aside the ability to query them in natural language, the ability to search for data across the enterprise will be useful. I think this is what happened to Project Barcelona.
  • Similarly, it seems as though this search capability is going to be significantly expanded on the public internet. At the moment, in Data Explorer Power Query we’ve seen the ability to query Wikipedia for data. Being able to query many other public data in the same way will be very powerful. There are a number of sites like Quandl that already make public datasets very easy to find and download, and the new search and query capabilities could leapfrog them.
  • No announcement on pricing has been made as yet. Please, please be ridiculously cheap!
  • We don’t have a date for the preview yet, but if you sign up here you’ll be notified when it’s available. It’s meant to be coming “later this summer”.

UPDATE: Some more things to add…

  • Something I didn’t pick up on at the time, but which emerged on Twitter later, is that PowerPivot has had a name change: it’s now Power Pivot with a space, to make it consistent with PowerV View and so on. This might seem minor, but for those of us who write books and have to sweat these details, it’s quite important!
  • I sense I’ve hurt a few feelings at MS with my comments on the natural language query. Let me be clear about my position here: I’ve not played with it, so I can’t pass judgement yet. I can imagine that natural language search for data will work well, but I will be very, very impressed if natural language query works well enough to be used on real data by real users. Real data is dirty and complex and user expectations will be very high and easily dashed. My guess is that the main issue will be that users can’t distinguish between what is a query and what is a calculation, and while the product can probably do queries well (eg “Show me the sales of widgets this year”) it may struggle with calculations (eg “Show me the customer churn by month this year”). But as I said, we shall see.

Finally, and as always, I reread this post this morning and worried that I sound too negative when in fact I’m very positive overall. To summarize:

  • The fact this is is all in Office and specifically Excel = the killer feature.
  • Mobile BI = good, even if it’s very late.
  • Power Query = very, very, very good indeed. I love it already and I think it’s going to be as big, if not bigger than Power Pivot. Power BI is worth buying for this alone.
  • Office 365 requirement = a problem for some customers, maybe, but understandable from Microsoft’s point of view.
  • Cloud requirement = again, a problem for some, but understandable and a big advantage for SMEs who can’t afford the cost of hardware and time to configure Sharepoint on-prem. The ability to refresh in the cloud from on-prem data sources is the key feature here.
  • Power Map = OK. Useful for customers who need geographic analysis, but it’s main use is that it’s great for demos (and this should not be underestimated – all products need some wow).
  • Power View goes HTML5 = relief. The Silverlight dependency undermined its credibility no end.
  • Natural language search for data sources = potentially very useful.
  • Natural language query of those data sources = see above. I remain to be convinced.
  • Data stewardship features = I haven’t seen enough of these to be able to comment.

More blog posts worth reading on this subject:
http://www.jenunderwood.com/blog.htm#O365PowerBI
http://www.jenstirrup.com/2013/07/power-business-intelligence-for.html

UPDATE #2: even more things to add…!

You can see the video of Amir’s demo from WPC here: http://www.youtube.com/watch?v=Jsa-5LGx_IY&feature=youtu.be

Some more details (which should be accurate) from a friend of mine at the conference:

  • The Power View standalone app is still Silverlight based, it’s only the mobile app that uses HTML5
  • Power Query queries can be shared between users via BI Sites, but the execution of these queries always takes place in desktop Excel. Queries will appear in searches when they’re published to a BI Site.
  • Excel workbooks connected to on-prem SSAS (Tabular and Multidimensional) will also be refreshable from a BI Site
  • Natural Language queries (what I think is being called “Q&A”) will also work against SSAS Tabular models. The Categorization property that’s in the Advanced tab in the PowerPivot window, and also in SSDT, is partly used to help Q&A do this.
  • External users can be given access to reports in BI Sites.
  • There will be a “Data Steward Portal” which will help you monitor who is doing what on your BI Site.
  • No comment on when this will arrive in Sharepoint on-prem. My feeling is that MS have no plans to do this at all, or maybe will only do it if a lot of people complain…?

Another Look At Google BigQuery

About a year ago I wrote a post looking at Google BigQuery which finished on a bum note when I ran into a limitation with the size of tables that could be used in a join. Recently I found out that that particular limitation had been lifted with the introduction of the new JOIN EACH operator, and since my account was still active and the data was still uploaded, I thought I’d see if my query could be made to run.

Just as a bit of background (if you can’t be bothered to read my previous post): the query runs on a sample data set originally created by Vertica (see here) that consists of a 1.2GB csv file with two integer columns and 86,220,856 rows. The rows represent the edges in a social network, and the aim of the query is to find the number of triangles – the number of cases where person A is connected to person B, B is connected to C and A is also connected to C. The query joins this large table to itself not once, but twice.

Anyway, to cut a long story short the query did work this time. Here’s the new version, rewritten to use JOIN EACH:

select count(*)
from
(select
e2.Source as e2Source, e2.Destination as e2Destination,
e3.Source as e3Source, e3.Destination as e3Destination
from
(select * from [Edges.EdgesFull]) as e2
join each
(select * from [Edges.EdgesFull]) as e3
on e2.destination = e3.source
where e2.source < e3.source) as e4
join each
(select * from [Edges.EdgesFull]) as e1
on e1.destination = e4.e2source
and e4.e3destination = e1.source
where e1.source < e4.e2source

Here’s the evidence:

image

It took 98.7 seconds to run, which I think is pretty good. Some remarks:

  • I’m not 100% sure this is the correct result, but I’m fairly confident.
  • I’m a long way from being 100% sure this is the most efficient way to write the query.
  • When the original Vertica blog post came out it sparked a p*ssing contest between vendors who wanted to to show they could do better than Vertica. You can see an example of other people’s results here; clearly other platforms could do a lot better, and remember these results are two years old.
  • Obviously you can’t judge the performance of a database on one query.

That’s all. My mind is now at rest, and I’ll go back to being a Microsoft fanboy tomorrow.

Building A Simple BI Dashboard With Visio 2013 And Visio Services

Let me start this post by saying that I am a long way away from being a Visio expert – I’ve used Visio, of course, to create diagrams and I’ve also played with its BI capabilities in the past, but nothing more than that. A recent post on the Visio team blog reminded me about Visio’s BI capabilities and Jen Underwood then mentioned that Visio 2013 has some new functionality for BI, so I thought I’d check it out in more detail and blog about what I found. I’ve never seen Visio used to build dashboards or reports in the real world, but a quick search shows that the Visio pros out there have been doing this for years, so maybe it’s time us BI folks learned a few tricks from them? Visio 2013 is by no means a perfect tool for BI but I was pleasantly surprised at what it can do: you can create data-linked diagrams/dashboards in Visio on the desktop very easily, and then publish them to Visio Services in Sharepoint where they can be viewed in the browser and where the data can be refreshed.

First of all, the PowerPoint deck here is a good place to start to learn about Visio and Visio Services 2013 dashboards, as is the Visio team blog and Chris Hopkins’ blog. There’s also a walkthrough of how to link data to shapes here, and a lot of other good posts out there on creating charts and graphs in Visio such as this one.

Here’s what I did. To start, I created a few tables with data in in Excel to act as a data source, then published the workbook up to Excel Services in Sharepoint Online (I have an Office 365 E4 subscription). The data looked like this:

image

I then opened up Visio 2013, connected to the workbook in Excel Services and imported the data from these two tables. With that done, I was able to select a shape, and then drag a row of data from the External Data pane onto the sheet, which gave me a data-linked shape. It was then fairly easy to configure the data graphic associated with each shape – for example, in the diagram below, I selected a City shape, then dragged the row containing sales for London onto the sheet, which gave me a City with the data for London linked to it, and next to the City shape I had an associated data graphic which I configured as a Data Bar of type Multi-bar Graph.

image

The text next to the frowning face is also linked to data from Excel. I could then publish this to Sharepoint Online, and view the diagram in the browser just by opening it from the document library:

image

All very straightforward. In Visio Services you can add comments, and also refresh the data. Data can be refreshed manually or on a schedule; I used Excel Services data in this demo because in Office 365 only Excel Services and Sharepoint List data sources can be refreshed but the story is much better in on-prem Sharepoint (the PowerPoint deck I linked to above has all the details). Weirdly, I found that if I modified my Excel data source in the Excel Web App it took a few minutes for the new data to come through in Visio even with me clicking Refresh, although it did eventually.

Obviously this is a very basic (and badly designed) dashboard that works within the limitations of Visio and Visio Services, and if you want to learn how to do this properly I suggest you check out the links above. But there are two important questions that now need to be answered:

Why, as a BI pro, would I want to create a dashboard in Visio rather than, say, Excel or Power View?

I suspect Visio isn’t used more widely in MS BI circles is 20% down to ignorance of what it can do, 30% the cost of licensing and the Sharepoint dependency, and 50% the fact that there are only a limited number of scenarios where it is the right tool to use. So when would you actually want to use it? The risk of using Visio is that you end up with a visually appealing infographic that is actually very bad at conveying the information you want to convey, the kind of thing Stephen Few is complaining about here. You’d probably only want to use it if the nature of the diagram contributed to your understanding of the data. For example if you wanted to look at which seats were filled more frequently than others in a theatre or an aeroplane, it might be useful to have a diagram showing the seat layout and colour the seats that get filled. I guess these scenarios are very similar to the kind of scenarios where it makes sense to plot geographical data on a map.

What functionality is Visio missing for it to be a serious BI tool?

Quite a lot. Leaving aside PivotDiagrams, there is no proper support for SSAS or PowerPivot for data linking and that’s a big problem in these days of self-service BI. I also don’t link the way you have to import data into Visio before it can be used: I’d want to be able to select the data I want using a PivotTable-like interface (generating MDX or DAX queries) and then bind it to shape, so that I could slice and filter my data inside Visio without having to keep on importing it; I imagine being very similar to Power View today, but where you could drag data-driven shapes onto a canvas instead of tables and graphs. Maybe Power View and Visio need to get together and have children?

 

I don’t want to finish this post on a critical note, though, because I’ve had a lot of fun learning more about Visio and its BI capabilities, and I hope to use it on a project soon. Now that Sharepoint and especially Office 365 are being pushed so heavily for BI (and being used more widely), maybe we’ll see a lot more of Visio dashboards?

Optimising Returning Customers Calculations in MDX

One of the more popular blog posts from my archives (86 comments so far) is the one I wrote on “Counting New and Returning Customers in MDX”. The trouble with all of the calculations in there is that they execute in cell-by-cell mode, and therefore perform quite badly.

For example, take the following query on Adventure Works to find the number of returning customers (customers who have bought from us today and have also bought something before in the past):

with

member measures.[Returning Customers V1] as

count(

intersect(

nonempty([Customer].[Customer].[Customer].members, [Measures].[Internet Sales Amount])

,

nonempty([Customer].[Customer].[Customer].members, 

    [Measures].[Internet Sales Amount] * {null : [Date].[Date].currentmember.prevmember})

)

)

 

select {measures.[Returning Customers V1]} on 0,

[Date].[Date].[Date].members

on 1

from 

[Adventure Works]

 

On a cold cache this takes 47 seconds on my laptop and a quick look in Profiler shows this executes in cell-by-cell mode. In the comments on the original post Deepak Puri suggested an alternative approach using the Customer Count distinct count measure:

with

member measures.customerstodate as

aggregate(null:[Date].[Date].currentmember, [Measures].[Customer Count])

 

member measures.customerstoprevdate as

([Date].[Date].currentmember.prevmember, [Measures].customerstodate)

 

member measures.newcustomers as

measures.customerstodate - measures.customerstoprevdate

 

member measures.[Returning Customers V2] as

[Measures].[Customer Count] - measures.newcustomers

 

select {measures.[Returning Customers V2]} on 0,

[Date].[Date].[Date].members

on 1

from 

[Adventure Works]

Interestingly, this performs even worse than the previous query (although I would have expected it to be better). So how can we write a query that returns in a reasonable amount of time?

I haven’t found a way to do this for a calculated measure defined on the server, to be used in a true ad hoc query environment like Excel (any suggestions welcome – please leave a comment if you can do it), but I have got a way of optimising this calculation for scenarios where you have control over the MDX being used, such as in SSRS.

Here’s the query:

with

 

set customerdates as

nonempty(

[Date].[Date].[Date].members

*

[Customer].[Customer].[Customer].members

, [Measures].[Internet Sales Amount])

 

set nondistinctcustomers as

generate(

customerdates,

{[Customer].[Customer].currentmember}, all)

 

member measures.customercountsum as

sum(null:[Date].[Date].currentmember, [Measures].[Customer Count])

 

member measures.[Returning Customers V3] as

count(

intersect(

subset(nondistinctcustomers

    , (measures.customercountsum, [Date].[Date].currentmember.prevmember)

    , [Measures].[Customer Count])

,

head(nondistinctcustomers

    , (measures.customercountsum, [Date].[Date].currentmember.prevmember))

)

)

 

 

select {measures.[Returning Customers V3]} on 0,

[Date].[Date].[Date].members

on 1

from 

[Adventure Works]

 

On my laptop, this query executes in around 5 seconds on a cold cache. The reason it’s so much faster is also the reason it can’t be used in ad hoc queries – it uses named sets to find all the combinations of customer date needed by the query in one operation. Here’s a step-by-step explanation of how it works:

  • First of all, the customerdates set gets a set of tuples containing every single combination of day and customer where a purchase was made, using a simple Nonempty().
  • Next, the nondistinctcustomers set takes the customerdates set and removes the dates, so what we are left with is a list of customers. It’s not a list of distinct customers, however – a given customer may appear more than once. This still represents a list of the customers that bought something each day, it’s just that we no longer have any information about which day we’re looking at.
  • The customercountsum measure allows us to take the list of customers in the nondistinctcustomers set and find out which customers bought something in any given day. It’s a running sum of the Customer Count measure. This is a distinct count measure, and usually you wouldn’t use the Sum() function on a distinct count, but it’s important we do here. How is it used? For example, let’s imagine we had just three days of data: on the first day we had three customers, on the second four customers and on the third five customers. That would result in the nondistinctcustomers set containing twelve (not necessarily distinct) customers. We can then use the running sum of a distinct count of customers to find out the index of the item in nondistinctcustomers that is the last customer in the list for each day. So on day two we would have a running sum of seven, and therefore the seventh item in nondistinctcustomers gives us the last customer in the list for that day.
  • Finally, the Returning Customers V3 measure gives us the number of returning customers each day. It uses the customercountsum measure to find the subsets of the nondistinctcustomers set that represent the customers that bought on the current day and the customers that bought on all days up to yesterday, then uses the Intersect() function to find the returning customers.

Upcoming SSAS and MDX Training Course Dates

If you’ve got some training budget burning a hole in your pocket, here’s a quick reminder of some upcoming SSAS and MDX courses I’m teaching:

Hope to see you there! If you’d like to find out about new courses by me and other SQL Server people, you can sign up to the Technitrain newsletter at http://www.technitrain.com

PS I’ve also got Andy Leonard coming over to London to run his ever-popular SSIS Design Patterns course too, in early September.

Flattening A Parent/Child Relationship In Data Explorer (Power Query)

NOTE: This post was written before Data Explorer was renamed as Power Query. All of the content is still relevant to Power Query.

I was teaching my SSAS cube design and performance tuning course this week (which I’ll be teaching in Sydney and Melbourne next month, along with some MDX – places still available!) and demonstrating BIDS Helper’s excellent functionality for flattening parent/child relationships, and it got me thinking – can I do the same thing in Data Explorer? Not that I need to do this in Data Explorer, you know, but it’s the kind of challenge I like to set myself. Of course you can do it, and quite elegantly, and since I learned yet more interesting stuff about Data Explorer and M while I was cracking this problem I thought I’d blog about it.

Here’s what I want to do. Consider the parent/child hierarchy in the DimEmployees table in the Adventure Works DW database:

image

Each row represents an employee, EmployeeKey is the primary key and ParentEmployeeKey is the key of the employee’s boss. Therefore, by joining the table to itself, we can recreate the org chart of the Adventure Works company (ie who reports to who). The problem though is that we need to join the table to itself multiple times to do this, and the number of times we need to do the join depends on the data itself. If you flatten a parent/child hierarchy by doing this, the end result should have a series of columns representing each level in the hierarchy, and look something like this:

image

This problem can be solved in SQL reasonably easily, even if the SQL you end up writing might look a little scary (see the views that BIDS Helper generates for an example of this). What about Data Explorer?

At the heart of my approach was a recursive function. I’ve blogged about creating functions in Data Explorer already, so you might want to read that post for some background. Here’s my function declaration:

let

    Source = (FromTable, KeyColumn, ParentKeyColumn, ToTable, optional Depth) =>

let

    GetDepth = if (Depth=null) then 1 else Depth,

    GetKeyColumn = if (Depth=null) then KeyColumn

        else Number.ToText(GetDepth-1) & "." & KeyColumn,

    GetParentKeyColumn = Number.ToText(GetDepth) & "." & ParentKeyColumn,

    JoinTables = Table.Join(FromTable,{GetKeyColumn},

        Table.PrefixColumns(ToTable , Number.ToText(GetDepth)),

            {GetParentKeyColumn}, JoinKind.LeftOuter),

    FinalResult = if

        List.MatchesAll(Table.Column(JoinTables, GetParentKeyColumn), each _=null)

        then FromTable

        else RecursiveJoin(JoinTables, KeyColumn, ParentKeyColumn, ToTable, GetDepth+1)

in

    FinalResult

in

    Source

A few interesting things to point out:

  • I’ve used a LET statement inside my function declaration, so I can have multiple statements inside it
  • I’ve used Table.Join to do the left outer join between the two tables I’m expecting
  • The parameters I’m using are:
    • FromTable – the table on the left hand side of the join. When the function is first called, this should be a table that contains the Employees who have no parents (ie where ParentEmployeeKey is null); when the function calls itself, this will be the result of the join.
    • ToTable – the table on the right hand side of the join. This is always a table that contains the Employees who do have parents.
    • KeyColumn – the name of the Employee’s key column
    • ParentKeyColumn – the name of the Employee’s parent key column
  • I’ve used Table.PrefixColumn to rename all the columns in the table on the right hand side of the join, prefixing them with the depth of the call stack, so I get distinct column names.
  • The function calls itself until it finds it has done a join where the last ParentKeyColumn contains only null values. I’ve used List.MatchesAll to check this.

Here’s the call to this function – you only need to include one step in the Data Explorer query to do this – to return the flattened structure:

= RecursiveJoin(

    Table.SelectRows(Employees, each [ParentEmployeeKey]=null),

    "EmployeeKey",

    "ParentEmployeeKey",

    Table.SelectRows(Employees, each [ParentEmployeeKey]<>null)

    )

And here’s the output:

image

In this case the output isn’t exactly the same as what BIDS Helper might produce, because BIDS Helper has some special requirements for SSAS user hierarchies. Also, since I’m still learning Data Explorer and M, I’m not sure my code in the most efficient, elegant way. But I still think it’s an interesting example and I hope it’s useful to other Data Explorer enthusiasts out there – we’re a small but growing band!

You can download my demo workbook here.

A New Events-In-Progress DAX Pattern

I’ve been working on a very complex SSAS Tabular implementation recently, and as a result I’ve learned a few new DAX tricks. The one that I’m going to blog about today takes me back to my old favourite, the events-in-progress problem. I’ve blogged about it a lot of times, looking at solutions for MDX and DAX (see here and here), and for this project I had to do some performance tuning on a measure that uses a filter very much like this.

Using the Adventure Works Tabular model, the obvious way of finding the number of Orders on the Internet Sales table that are open on any given date (ie where the Date is between the dates given in the Order Date and the Ship Date column) is to write a query something like this:

EVALUATE

ADDCOLUMNS (

    VALUES ( 'Date'[Date] ),

    "OpenOrders",

    CALCULATE (

        COUNTROWS ( 'Internet Sales' ),

        FILTER( 'Internet Sales', 'Internet Sales'[Ship Date] > 'Date'[Date] ),

        FILTER( 'Internet Sales', 'Internet Sales'[Order Date] <= 'Date'[Date] )

    )

)

ORDER BY 'Date'[Date]

On my laptop this executes in around 1.9 seconds on a cold cache. However, after a bit of experimentation, I found the following query was substantially faster:

EVALUATE

ADDCOLUMNS (

    VALUES ( 'Date'[Date] ),

    "OpenOrders",

    COUNTROWS(

        FILTER(

            'Internet Sales',

            CONTAINS(

                DATESBETWEEN('Date'[Date]

                    , 'Internet Sales'[Order Date]

                    , DATEADD('Internet Sales'[Ship Date],-1, DAY))

                , [Date]

                , 'Date'[Date]

            )

        )

    )

)

ORDER BY 'Date'[Date]

On a cold cache this version executes in just 0.2 seconds on my laptop. What’s different? In the first version of the calculation the FILTER() function is used to find the rows in Internet Sales where the Order Date is less than or equal to the Date on rows, and where the Ship Date is greater than the Date. This is the obvious way of solving the problem. In the new calculation the DATESBETWEEN() function is used to create a table of dates from the Order Date to the day before the Ship Date for each row on Internet Sales, and the CONTAINS() function is used to see if the Date we’re interested in appears in that table.

I’ll be honest and admit that I’m not sure why this version is so much faster, but if (as it seems) this is a generally applicable pattern then I think this is a very interesting discovery.

Thanks to Marco, Alberto and Marius for the discussion around this issue…

UPDATE: Scott Reachard has some some further testing on this technique, and found that the performance is linked to the size of the date ranges. So, the shorter your date ranges, the faster the performance; if you have large date ranges, this may not be the best performing solution. See https://twitter.com/swreachard/status/349881355900952576

UPDATE: Alberto has done a lot more research into this problem, and come up with an even faster solution. See: http://www.sqlbi.com/articles/understanding-dax-query-plans/

BI Survey 13

Just a quick post to say that the BI Survey 13 is now open and would like your feedback about the BI tools you’re using. Here’s the link:
https://digiumenterprise.com/answer?link=1358-CTDDUWQN

It’s worth doing because you’ll get a summary of the results and will get entered in a draw for some Amazon vouchers; and of course you get to make sure that lots of nice things are said about the Microsoft BI stack! I’ll blog about the results later in the year when they come out – the findings are always very interesting.

Analysis Services Multidimensional Now Works With Power View–And Why That’s Important

By now you may have already heard the news that, as part of SQL Server 2012 SP1 CU4, new functionality has been released that means that Power View now works with Analysis Services Multidimensional (ie cubes, as opposed to the Tabular Model, which always worked with Power View). I won’t bother to repeat the technical details which you can read about here:
http://blogs.msdn.com/b/analysisservices/archive/2013/05/31/power-view-connectivity-for-multidimensional-models-released.aspx

…but the main points are that Analysis Services Multidimensional can now be queried in DAX, and this plus some tweaks to Power View mean that the two can be used together for the first time. Unfortunately Power View in Excel 2013 doesn’t work with Analysis Services Multidimensional yet, but I hope that will also be fixed very soon.

I’ve been playing with the public CTP of this for a while and done a few presentations with it, and from a technical point of view it’s a solid bit of work by the Analysis Services dev team. It just works, and while there are a few limitations they’re trivial. Arguably it should not have been necessary to do it in the first place – why didn’t Power View speak MDX when it was built, which would have meant it could have queried both Tabular and Multidimensional? But it’s here now, and that’s what counts. It also opens up some interesting possibilities for using DAX queries to create detail-level reports on cubes, and also for defining DAX calculations inside those queries.

However I think its real importance is strategic. This is the first significant bit of new functionality in Analysis Services Multidimensional for a long while, and it acts as a bridge between the classic SQL Server BI stack that most of us are using and the brave new world of Office/Sharepoint-led BI. It is also the first time in a long time that Analysis Services Multidimensional users have had a dedicated client tool for data analysis from Microsoft that isn’t Excel. Don’t get me wrong, I love Excel as a client tool for SSAS but I’ve always thought (and I think industry trends over the last few years support this view) that even though Excel is a great way to bring data analysis to the masses, there’s still an important niche among power users for a more advanced data analysis and data visualisation tool.

You may be thinking at this point that pretty graphs and charts are all very well, but your users don’t need anything other than the SSRS reports and basic PivotTables that they’ve been using for the last few years. I say that you ignore Power View at your own risk. Microsoft’s competitors in the BI space are hungry for new customers and are interested in migration projects. You might well arrive at the office next Monday morning to find that there’s a new CFO who used QlikView in his last job, and who wants the same pretty graphs and charts he had there again. It’s not going to be any use arguing that you’ve spent years developing this cube, that it’s lightning fast and has all sorts of tricky business logic coded into thousands of lines of MDX – if your BI solution’s user interface looks and feels dated, then whatever its technical merits it will have the musty smell of legacy software about it. If, however, you can fire up a VM with Sharepoint 2013 and Power View on and show off some slick dashboards created from your existing cubes, even if this is something the majority of your end users wouldn’t really be interested in (and you may be wrong, they might love it), you’re going to be showing the business two important things:

  • Microsoft can do sexy dashboards and visualizations too, and while they come at a price, that price is probably a lot less than it would cost to rip and replace what you’ve got with a competitor’s software. So the option’s there if you want to spend the money and do the upgrade.
  • Analysis Services cubes are not a dead-end, and Microsoft has made a significant investment here to prove this. I’d still love to see a coherent roadmap that explains where Microsoft is heading with its BI tools and how it expects its existing customers to get there, but I doubt we’ll get one. This functionality was, however, delivered in response to popular demand, so I’m hopeful that if we as customers can make our voices heard as to what we want in the future then we can influence Microsoft’s direction.

So go forth and Power View. Both Rob Kerr and Koen Verbeeck have recently published some excellent, detailed guides to setting up a Sharepoint 2013 demo environment; you have no excuse for not testing this out and being ready to face the competition.