The Ethics Of Big Data

Some time ago I received a review copy of a book called “Ethics Of Big Data” from O’Reilly; I didn’t get round to writing a review of it here for a number of reasons but, despite its flaws (for example its brevity and limited scope), it’s worth reading. It deals with the ethics of data collection and data analysis from a purely corporate point of view: if organisations do not think carefully about what they are doing then

“Damage to your brand and customer relationships, privacy violations, running afoul of emerging legislation, and the possibility of unintentionally damaging reputations are all potential risks”

All of which is true, although I think what irked me about the book when I read it was that it did not tackle the wider and (to my mind) more important question of the social impact of new data technologies and their application. After all, this is what you and I do for a living – and I know that I haven’t spent nearly enough time thinking these issues through.

What prompted me to think about this again was a post by Adam Curtis which argues that the way that governments and corporations are using data is stifling us on a number of levels from the personal to the political:

“What Amazon and many other companies began to do in the late 1990s was build up a giant world of the past on their computer servers. A historical universe that is constantly mined to find new ways of giving back to you today what you liked yesterday – with variations.

Interestingly, one of the first people to criticise these kind of “recommender systems” for their unintended effect on society was Patti Maes who had invented RINGO. She said that the inevitable effect is to narrow and simplify your experience – leading people to get stuck in a static, ever-narrowing version of themselves.

Stuck in the endless you-loop.”

Once our tastes and opinions have been reduced to those of the cluster the k-means algorithm has placed us in we have become homogenised and easier to sell to, a slave to our past behaviour. Worse, the things we have in common with the people in other clusters become harder to see. Maybe all of this is inevitable, but if there is going to be an informed debate on this then shouldn’t we, as the people who actually implement these systems, take part in it?

Calculating The Value Of Overdue Invoices Using Many-To-Many Relationships in SSAS Multidimensional

Recently I had to solve the same, rather complex, problem for three different customers within the space of two weeks, a variation of the ‘events-in-progress’ problem I’ve blogged about a lot in the past. It’s this one: how can you calculate the value of your overdue invoices on any given date? It’s important to stress that we are not talking about the invoices that are overdue today – we want to be able to pick any date in the past and find out the value of invoices that were overdue at that point in time.

Let’s imagine you have a fact table containing payments against invoices: each row has an invoice number, the date of the payment, the payment amount, and the date that the invoice is due. A positive value in the Amount column indicates that this is the opening amount of the invoice; negative values in the Amount column are payments against the invoice. Multiple payments can be made against an invoice before it is fully closed. Here’s some example data:

image

Looking at the rows highlighted for invoice 5, you can see in the upper box that there is a positive amount of £35 shown on January 1st 2014 – this is the date that the invoice was opened, and £35 is the full value of the invoice. The invoice is due on January 10th 2014. In the lower box you can see there were four separate payments of £5, £5, £10 and £15 before the invoice was fully paid off on January 12th 2014.

Now, if you were to do a total-to-date in MDX (similar to what I describe here) it would be pretty easy to calculate the outstanding amount on all invoices on any given date. However the requirement here is not only to do that, but to break the value down so that you can see what the total value for overdue invoices and non-overdue (which I’m going to call backlog in this post) invoices is on any given date. This is tricky to do because we have to take the Due Date of each invoice into account as well as the Payment Date. The desired output for invoice 5 is this:

image

As you can see in this PivotTable, for invoice 5 £25 was still outstanding on January 9th 2014, but this is shown as backlog because this is before the due date of January 10th. On January 11th, one day after the due date, the remaining outstanding amount of £15 is shown as overdue. Of course, we also want to be able to calculate the correct values for all invoices:

image

One way of solving this problem would be to calculate the overdue and backlog values for each date that each invoice is open in your ETL, and store these values in a snapshot fact table. This works, and will give you the best possible query performance, but it has two major drawbacks: first, it makes your ETL much slower, and second it makes your fact table much larger. This post describes how you can calculate the overdue and non-overdue values on any given date using many-to-many relationships instead, without having to blow out the fact table.

To go along with the fact table (called FactInvoice) shown above, I have a date dimension table called DimDate (I have deliberately reduced the number of rows here to the dates I have data for, for reasons that will become clear soon):

image

I have a dimension table containing all of the days that an invoice can be overdue or not overdue for (which has one less than double the number of rows as the date dimension table), plus a second column classifying each row as ‘Backlog’ or ‘Overdue’:

image

… and an invoice dimension table that just contains the distinct invoice numbers called DimInvoice.

Now, let me explain how to build the SSAS cube.

Step 1

  • Build dimensions from all of the tables shown above, so you have dimensions called Date, Overdue Days and Invoice.
  • Build a cube with one measure group, based on FactInvoice, and create one measure with AggregateFunction Sum based on the Amount column in that table.
  • Add the Invoice dimension to the cube with a regular relationship. Add the Date dimension to the cube twice, as Payment Date and Due Date, with regular relationships on the PaymentDate and DueDate columns. The Dimension Usage tab should look like this:

image

Step 2

  • Create the following view in SQL Server:
  • CREATE VIEW [dbo].[FactDateToPaymentDate]
    AS
    SELECT        a.DateKey, b.DateKey AS PaymentDateKey
    FROM            dbo.DimDate AS a INNER JOIN
                             dbo.DimDate AS b ON a.DateKey >= b.DateKey

    This view returns all of the combinations of a given date and all dates up to and including the date.

  • Add this view to the DSV and create a new measure group from it; you’ll need to create a measure here, but it can just be a Count measure.
  • Add the Date dimension to the cube once again, this time leaving the name as Date (you will now have three role-playing copies of the Date dimension in the cube).
  • Set up regular relationships between the Date and Payment Date dimensions and the new measure group, then a many-to-many relationship between Date and the Invoice measure group. This means that when you query the cube by the Date dimension, the many-to-many relationship will mean you see the sum of all Amounts whose payment date is up to and including the date selected. The Dimension Usage tab will look like this:

image

Step 4

  • Create another view in SQL Server with this definition:
  • CREATE VIEW [dbo].[FactDueDateToAsOfDate]
    AS
    SELECT        a.DateKey AS DueDateKey, b.DateKey AS AsOfDateKey, DATEDIFF(dd, a.FullDate, b.FullDate) AS OverDueDays
    FROM            dbo.DimDate AS a CROSS JOIN
                             dbo.DimDate AS b

    Yes, I am cross joining the DimDate table with itself and yes, this could return a lot of rows. However you should find that the view is very fast to execute.

  • Add this view to the DSV and build another measure group from it, again with a single Count measure on it.
  • Set up a regular relationship between this new measure group and the Due Date dimension.
  • Add the Overdue Days dimension to the cube, give it a regular relationship with the new measure group and a many-to-many relationship with the Invoice measure group through the new measure group.
  • Add the Date dimension to the cube yet again, creating a new role-playing dimension called As Of Date, give it a regular relationship with the new measure group and a many-to-many relationship with the Invoice measure group through the new measure group. The Dimension Usage tab will now look like this:

image

The purpose of this measure group is this: if you select a date on the As Of Date dimension, you will be able to select ‘Overdue’ on the Overdue Days dimension and this will give you all of the dates on Due Date that were overdue on that date.

Step 5

  • You only want to have to select one date in your PivotTable, so create the following MDX calculated measures that take your selection on the Date dimension and applies it to the As Of Date dimension too:

CREATE MEMBER CURRENTCUBE.MEASURES.[Open Amount Hidden] AS
([Measures].[Amount],
LINKMEMBER([Date].[Date].CURRENTMEMBER, [As Of Date].[Date])
), VISIBLE=FALSE;

CREATE MEMBER CURRENTCUBE.MEASURES.[Open Amount] AS
IIF(MEASURES.[Open Amount Hidden]=0, NULL, MEASURES.[Open Amount Hidden]);

  • Hide all measures apart from the calculated measures you’ve just created
  • Hide the Payment Date, Due Date and As Of Date dimensions

Conclusion

This is a very complex pattern, I know, and this is after I’ve simplified it a lot (if you need currency conversion as well then things get even worse) but I also know it’s extremely useful from a business point of view. Query performance is also reasonably good, at least in the places where I have implemented this.

You can download my sample SQL Server database and VS 2012 project here.

Working With Excel Named Ranges In Power Query

One of the more recent additions to Power Query is the ability to access data from named ranges in the Excel worksheet rather than an Excel table. I’ve got used to formatting data as tables in Excel because that’s what Power Pivot needs to import data directly from the worksheet, but if you are working with Power Query and a pre-existing workbook then trying to reformat data as tables can be a pain. Also, if you just want to import a single value, for example as a parameter to a query, a table seems like overkill.

(Incidentally, if you’re wondering what a named range is in Excel, there are tons of good introductions to the subject on the internet like this one. You can do loads of cool stuff with them.)

Consider the following Excel worksheet:

image

There are three named ranges here: FirstRange, SecondRange, ThirdRange, and the values in the cells show which range the cells are in. FirstRange consists of two cells in two columns; SecondRange consists of three cells in a single row; and ThirdRange is consists of three, non-contiguous cells. (You can also use this trick to display the names of all contiguous ranges in an Excel workbook, but alas it does seem to work for non-contiguous ranges).

At the moment, the Power Query ribbon doesn’t make it obvious that you can use named ranges as data sources. However you can see all the tables and cells in a worksheet, and even return that list from a query, by creating a blank query and using the expression

= Excel.CurrentWorkbook()

image

Once you’ve done this you can see all the ranges (and also any tables) in the workbook, and click on the table link next to the name to see the data. For example, clicking on FirstRange shows the following table in a new step in the query editor:

image

The expression to get at this table in a single step is:

= Excel.CurrentWorkbook(){[Name="FirstRange"]}[Content]

The range SecondRange in my example is equally straightforward to reference, and you can see its contents by using the expression

= Excel.CurrentWorkbook(){[Name="SecondRange"]}[Content]

image

Unfortunately ThirdRange, which is not contiguous, is a problem: I can only get the first cell in the range. So the expression

= Excel.CurrentWorkbook(){[Name="ThirdRange"]}[Content]

Returns just this table:

image

It would be nice if we could get a list containing the cell values, rather than a table, for ranges like this…

Last thing to mention is that if you do want the value in a cell, rather than a table, you just need to right-click inside the cell in the Query Editor and select Drill Down:

image

This returns the value (in this case the text “Third Range Cell 1”) in the cell you clicked on:

image

This is a much more useful value to return than a table containing a single row/column, if you intend to use a value from a single cell in a named range as a parameter to another query.

You can download the sample workbook for this post here.

New Power BI Q&A Functionality Released: Optimisation In The Browser

Seems like another new bit of Power BI functionality got released today: the ability to optimize your data model for Q&A in the browser. Here’s the link to the docs:

http://office.microsoft.com/en-us/office-365-business/power-bi-q-a-optimize-a-power-bi-workbook-cloud-modeling-HA104226408.aspx?redir=0

Previously, the ability to add synonyms to your model to improve the results you got from Q&A was only available in Excel on the desktop, inside the Power Pivot window. Now you can do this, as well as new stuff like add phrasings (described here) and view usage reports, in your Power BI site.

I won’t repeat what the docs say about the actual functionality, but this seems to be yet more evidence that Excel on the desktop is no longer the central hub for Power BI. If this is the case, this is a massive strategic change, and I can understand why it has happened: the need for the ‘right’ version of Excel on the desktop is a massive roadblock for Power BI adoption, especially in enterprise accounts (see also Jen Underwood’s comments on this from yesterday). Maybe now it’s BI in the browser instead?

New Power BI Features Shown At WPC

OK, so I’m not at WPC this year but I have just watched this video of Scott Guthrie’s session “The Cloud for Modern Business”. If you’re interested in seeing some new Power BI features take a look at the demo by James Phillips, general manager for Power BI, starting at 21:20:

http://www.digitalwpc.com/Videos/Pages/Videos.aspx?g=4d5ef40c-dc5b-426d-9a7a-8dd6274bb42b#fbid=gaOpLt1jjcA

Some of the new things I noticed:

  • 21:40 – a nice shot of one of the new Power BI dashboards first announced at the PASS BA Conference earlier this year. You can see several new types of visualisation such as treemaps, radar charts and gauges (gauges? GAUGES? Shhh, don’t tell Stephen Few).
  • 22:33 – a list of out-of-the box data sources is shown from which new models can be created. They include: Salesforce, MS Dynamics, Facebook, Google Analytics, Twitter, and Upload Excel.
  • 22:50 – data is imported from Salesforce in the browser. This isn’t happening in Excel on the desktop, folks, it’s in the browser. This is significant!
  • 23:10 – another new visualisation shown, a doughnut chart (if that’s the right term). I see names of people from the Power Query team in the data.
  • 24:50 – a Q&A analysis is pinned to the dashboard
  • 25:50 – much is made of the fact that the dashboard is touch-enabled
  • 25:55 – “Partner Solution Packs” are announced. This sounds important! It seems to be referring to the Salesforce demo earlier, and these solution packs are said to include: data, connectivity to the data sources, visualisation and interactive reports. So it sounds like Microsoft are going to encourage data vendors (or other sources of data) to build these solution packs on top of Power BI as pre-packaged analytical apps. Probably a good idea.
  • 26:15 – editing a dashboard in the browser and swapping one visualisation for another. Again, the HTML 5 browser based editing experience – we haven’t seen Excel once in this demo.
  • 27:55 – “If there was ever a partner opportunity, this is it”. Again much emphasis here. Seems like these new Power BI features, especially the solution packs, are aimed at giving partners incentives to sell and customise Power BI (something which they have not had up to now, to be honest).

Oh, and you probably already heard that Azure Machine Learning is now in public preview. Check out the docs and samples here. I wouldn’t be surprised if there was some integration between this and Power BI to come too.

Power Query Book Published!

Looking for some summer holiday (or winter holiday, depending on which hemisphere you live in) reading? If so, may I suggest my new Power Query book? “Power Query for Power BI and Excel” is available now from the Apress site, Amazon.com, Amazon.co.uk and all good bookstores.

Power Query for Power BI and Excel Cover Image

It’s an introductory level book. It covers all of the stuff you can do in the UI, it has a chapter on M, and it goes into a reasonable amount of detail on more advanced topics; it is not a 500-page exhaustive guide to the product. I’ve focused on readability and teaching the fundamentals of Power Query rather than every looking at every obscure M function, but at the same time if you’ve already used Power Query I think there’ll be plenty of material in there you’ll find interesting.

Now for the bad news: the book is out-of-date already, although not by much. One of the best things about Power Query is the monthly release cycle; unfortunately that makes writing a book on it a bit of a nightmare. I started off writing in January and had to deal with lots of added functionality and changes to the UI over the next few months; I had to retake pretty much all of the screenshots as a result. The published version of the book is based on the version of Power Query that was released in early June rather than the current version. Hopefully you can forgive this – the differences are minor – but it’s a good reason to buy the book as soon as you can! I want to do a second edition in a year’s time once (if?) the release cycle slows down.

I’ve been teased a bit for blogging and teaching so much about Power Query recently, so the final thing I want to say here is why an old corporate BI/SSAS guy like me is getting so excited about a self-service ETL tool. Well, the main reason is that Power Query is a great piece of software. It does what it does very well; it does useful things rather than what the marketing guys/analysts/journalists think is hot in BI; it is easy to use but at the same time is flexible enough for the advanced user to do really complex stuff; it is updated regularly based on feedback from its users. I only wish all Microsoft software was this good… Honestly, I wouldn’t be able to motivate myself to blog and write about Power Query if I didn’t think it was cool, and even though it hasn’t been hyped in the same way as other parts of the Power BI stack it is nonetheless the part that people get excited about when I show them Power BI. It’s not just me either – every day I see positive comments like Greg Low’s here. I think it is as important, if not more important, than Power Pivot and I think it will be a massive success.

Oh, and did I mention that I’m also teaching a Power Query course in London later this year….?

Optimising MDX Calculations With The Unorder() Function

The Unorder() function is probably one of the least used functions in the whole of MDX. It exists only as a query performance hint and, since I had never up to now found a scenario where it did improve the performance of a calculation I had pretty much forgotten about it (as Books Online says, the optimisation it performs is applied automatically in many cases). However I was playing around with some calculations last week and found out that it does have its uses…

What does the Unorder() function do? It’s a function that takes a set and returns a set, and what it does is remove any implicit ordering from that set. By default all sets in MDX are ordered, but for some types of operation that ordering is unimportant and ignoring it can result in faster query performance.

Take, for example, the following query on the Adventure Works cube which shows the number of customers who have bought something up to the current date:

WITH

MEMBER MEASURES.CUSTOMERSTODATE AS

COUNT(

    NONEMPTY(

        [Customer].[Customer].[Customer].MEMBERS

    , {[Measures].[Internet Sales Amount]}

      *

      {NULL : [Date].[Calendar].CURRENTMEMBER})

)

 

SELECT

{MEASURES.CUSTOMERSTODATE}

ON COLUMNS,

[Date].[Calendar].[Date].MEMBERS

ON ROWS

FROM

[Adventure Works]

 

On my laptop it executes in 35 seconds on a cold cache. We can optimise the calculation here simply by wrapping the set of all members on the Customer level of the Customer hierarchy with the Unorder() function, so:

UNORDER([Customer].[Customer].[Customer].MEMBERS)

The following query now executes in 27 seconds on a cold cache:

WITH

MEMBER MEASURES.CUSTOMERSTODATE AS

COUNT(

    NONEMPTY(

        UNORDER([Customer].[Customer].[Customer].MEMBERS)

    , {[Measures].[Internet Sales Amount]}

      *

      {NULL : [Date].[Calendar].CURRENTMEMBER})

)

 

SELECT

{MEASURES.CUSTOMERSTODATE}

ON COLUMNS,

[Date].[Calendar].[Date].MEMBERS

ON ROWS

FROM

[Adventure Works]

 

As far as I can tell, Unorder() only makes a difference on calculations when used in combination with NonEmpty(), and when it is used over a large set (here the set of customers has around 18000 members). If you have calculations like this I would recommend testing to see if Unorder() makes a difference – if it does, please leave a comment and let me know what you find!

Technitrain Courses In London This Autumn

I’ve just put up a bunch of new courses (including a dedicated Power Query course!) on the Technitrain site that will be running this autumn in London. They are:

I hope to see some of you there! Don’t forget you can also get 10% off on my MDX training videos and lots of other great MS BI content at Project Botticelli using the discount code TECHNITRAIN2014.

Using List.Generate() To Make Multiple Replacements Of Words In Text In Power Query

Recently I had a request for help from someone who wanted to do the following in Power Query: take a piece of text and then, using a table, search for all of the occurrences of the words in one column of the table in the text and replace those words with those in the other column. So, for example, given these two tables in Excel:

image

You want to take the table on the left and for each piece of text replace the words in the ‘Word To Replace’ column of the right-hand table with those in the ‘Replace With’ column of the right-hand table. The output would therefore be:

image

An interesting challenge in itself, and one I solved first of all using a recursive function. Here’s some code showing how I did it:

let

    //Get table of word replacements

    Replacements = Excel.CurrentWorkbook(){[Name="Replacements"]}[Content],

    //Get table containing text to change

    TextToChange = Excel.CurrentWorkbook(){[Name="Text"]}[Content],

    //Get a list of all words to replace

    WordsToReplace = Table.Column(Replacements, "Word To Replace"),

    //Get a list of all words to replace with

    WordsToReplaceWith = Table.Column(Replacements, "Replace With"),

    //Recursive function to do the replacement

    ReplacementFunction = (InputText, Position)=> 

    let 

     //Use Text.Replace to do each replace

     ReplaceText = Text.Replace(

            InputText, 

            WordsToReplace{Position}, 

            WordsToReplaceWith{Position})

    in

     //If we have reached the end of the list of replacements

     if Position=List.Count(WordsToReplace)-1 

      then 

      //return the output of the query

      ReplaceText 

      else 

      //call the function again

      @ReplacementFunction(ReplaceText, Position+1),

    //Add a calculated column to call the function on every row in the table

    //containing text to change

    Output = Table.AddColumn(TextToChange, "Changed Text", each ReplacementFunction([Text], 0))

    

in

    Output

 

It does the job, but… after thinking about this some more, I wondered if there was a better way. A lot of my recent Power Query blog posts have used recursive functions, but are they a Good Thing? So I asked on the forum, and as usual the nice people on the Power Query dev team answered very promptly (that’s one of the things I like about the Power Query dev team – they engage with their users). Recursive functions are indeed something that should be avoided if there is an alternative, and in this case List.Generate() can be used instead. Here’s how:

let

    //Get table of word replacements

    Replacements = Excel.CurrentWorkbook(){[Name="Replacements"]}[Content],

    //Get table containing text to change

    TextToChange = Excel.CurrentWorkbook(){[Name="Text"]}[Content],

    //Get list of words to replace

    WordsToReplace = Table.Column(Replacements, "Word To Replace"),

    //Get list of words to replace them with

    WordsToReplaceWith = Table.Column(Replacements, "Replace With"),

    //A non-recursive function to do the replacements

    ReplacementFunction = (InputText)=> 

     let

       //Use List.Generate() to do the replacements

       DoReplacement = List.Generate(

                          ()=> [Counter=0, MyText=InputText], 

                          each [Counter]<=List.Count(WordsToReplaceWith), 

                          each [Counter=[Counter]+1, 

                                MyText=Text.Replace(

                                         [MyText], 

                                         WordsToReplace{[Counter]}, 

                                         WordsToReplaceWith{[Counter]})], 

                          each [MyText]),

       //Return the last item in the list that

       //List.Generate() returns

       GetLastValue = List.Last(DoReplacement)

     in

      GetLastValue,

    //Add a calculated column to call the function on every row in the table

    //containing the text to change

    Output = Table.AddColumn(TextToChange, "Changed Text", each ReplacementFunction([Text]))

in

    Output

 

List.Generate() is a very powerful function indeed, albeit one that took me a while to understand properly. It’s a bit like a FOR loop even if it’s a function that returns a list. Here’s what each of the parameters I’m passing to the function in the example above do:

  •  ()=> [Counter=0, MyText=InputText] returns a function that itself returns a record (a record is a bit like a table with just one row in it). The record contains two fields: Counter, which has the value 0, and MyText which is given the value of the text where the values are to be replaced. This record is the initial value that List.Generate() will modify at each iteration.
  • each [Counter]<=List.Count(WordsToReplaceWith) returns a function too. An each expression is a quick way of declaring a function that takes one, unnamed parameter, and in this case the value that will be passed to this parameter is a record of the same structure as the one declared in the previous bullet. The expression [Counter] gets the value of the Counter field from that record. The function returns a boolean value, true when the value in the [Counter] field of the record is less than or equal to the number of items in the list of words to replace. List.Generate() returns a list, and while this function returns true it will keep on iterating and adding new items to the list it returns.
  • each [Counter=[Counter]+1, MyText=Text.Replace([MyText], WordsToReplace{[Counter]}, WordsToReplaceWith{[Counter]})] returns yet another function, once again declared using an each expression. The function here takes the record from the current iteration and returns the record to be used at the next iteration: a record where the value of the Counter field is increased by one, and where the value of the MyText field has one word replaced. The word that gets replaced in MyText is the word in the (zero-based) row number given by Counter in the ‘Word To Replace’ column; this word is replaced by the word in the row number given by Counter in the ‘Replace With’ column.
  • each [MyText] returns a very simple function, one that returns the value from the MyText field of the record from the current iteration. It’s the value that this function returns that is added to the list returned by List.Generate() at every iteration.

To illustrate this, here’s a simplified example showing how List.Generate() works in this case:

let

    WordsToReplace = {"cat", "dog", "mat"},

    WordsToReplaceWith = {"fish", "snake", "ground"},

    Demo = List.Generate(

                          ()=> [Counter=0, MyText="the cat and the dog sat on the mat"], 

                          each [Counter]<=List.Count(WordsToReplaceWith), 

                          each [Counter=[Counter]+1, 

                                MyText=Text.Replace(

                                         [MyText], 

                                         WordsToReplace{[Counter]}, 

                                         WordsToReplaceWith{[Counter]})], 

                          each [MyText])

 

in

    Demo

 

The output of this query is the list:

image

This list can be written as (with the words changed at each iteration highlighted):

{“the cat and the dog sat on the mat”, “the fish and the dog sat on the mat”,  “the fish and the snake sat on the mat”, “the fish and the snake sat on the ground”}

So, another useful function to know about. I’m slowly getting to grips with all this functional programming!

You can download the sample workbook here.

Using Slicer Selections In The CubeSet Function

I had an interesting challenge from a customer yesterday – one of those problems that I’d known about for a long time but never got round to working out the solution for…

Consider the following PivotTable, based on a PowerPivot model using Adventure Works data, in Excel 2010:

image

It shows the top 10 products by the measure Sum of Sales. There are two slicers, and the top 10 shown in the PivotTable reflects the selections made in the slicers. All of this works fine. But what if you want to use Excel cube functions to do the same thing? You can write the MDX for the top 10 products quite easily and use it in the CubeSet() function in your worksheet, but how can you get your MDX set expression to respect the selection made in the slicers?

The solution to this problem is very similar to the trick I showed here – finding the selected items in a slicer is not easy! Here are the steps I followed to do it:

  • Add the slicers for EnglishOccupation and CalendarYear to a new worksheet
  • Go to Slicer Settings and uncheck the box for “Visually indicate items with no data”
  • Add two new PivotTables to the worksheet. Connect one to the EnglishOccupation slicer and put EnglishOccupation on rows; connect the other to the CalendarYear slicer and put CalendarYear on rows.
  • Use the OLAPPivotTableExtensions add-in (which you can download here) to add new MDX calculated measures to each PivotTable. For the EnglishOccupation PivotTable call the measure SelectedOccupations and use the following MDX:
    SetToStr(Except(Axis(0), {[Customer].[EnglishOccupation].[All]}))
    This expression does the following: it uses the Axis() function to find the set of members selected on what Excel thinks of as the rows axis in the PivotTable (actually the MDX columns axis), then uses Except() to remove the All Member from the hierarchy (which Excel uses for the Grand Totals) and then uses SetToStr() to take that set and return the string representation of it. Do the same thing for the PivotTable showing CalendarYear too, calling the calculated measure SelectedYears; the MDX in this case is:
    SetToStr(Except(Axis(0), {[Date].[CalendarYear].[All]}))
    This is what the EnglishOccupation PivotTable should look like:
    image
  • Next, to make things easy, use Excel formulas to get the values from the top cell inside each PivotTable into cells elsewhere in the worksheet, and give these cells the names SelectedOccupations and SelectedYears.
    image
  • Then enter a CubeSet() function into a new cell using the following formula:
    =CUBESET(
    "PowerPivot Data",
    "Topcount(
    [Product].[EnglishProductName].[EnglishProductName].members,
    10,
    Sum(" & SelectedOccupations & " * " & SelectedYears & ",[Measures].[Sum of Sales])
    )",
    "Top 10 Set")
    What this does is use the TopCount() function to find the top 10 Products, and in the third parameter of this function which is the numeric expression to find the top 10 by, it crossjoins the two sets of selected occupations and selected years and then sums the output of the crossjoin by the measure [Sum of Sales].
  • Last of all, build your report using the Excel cube functions as normal, using the CubeRankedMember() function to get each item from the top 10 set created in the previous step.

image

You can download my sample workbook here.

The bad news about this technique is that it doesn’t work in Excel 2013 and Power Pivot. It’s no longer possible to create MDX calculated measures on Power Pivot models in Excel 2013, alas. It will work if you’re using any version of Excel from 2007 on against Analysis Services and, as I show here, Excel 2010 and PowerPivot. If you are using Power Pivot and Excel 2013 it might be possible to create a DAX measure to do the same as the MDX I’ve used here (I’m wondering if the technique Jason describes here will work). It would certainly be possible to use CubeRankedMember() to find each item selected in the slicer, as Erik Svensen shows here, and then use Excel formulas to find the MDX unique name for each selected member and concatenate these unique names to create the set expression that my calculated measures return, but that’s a topic for another post. This really should be a lot easier than it is…