Two New Books: “The Definitive Guide To DAX” And “’M’ Is For Data Monkey”

I’m not going to pretend that this blog post is a properly impartial review – I know the authors of both of these books to varying degrees – but I thought it was worth writing a few words on two new books I’ve acquired recently which are worth additions to any Power BI enthusiast’s bookshelf or e-reader.

The Definitive Guide To DAX

Something I’ll never understand about my friends Marco Russo and Alberto Russo is their love of writing books – they generally have a new one out every year, sometimes two (personally I find writing books painful). Their latest publication is “The Definitive Guide To DAX” and it does indeed live up to its title. No-one outside the dev team comes close to Marco and Alberto’s knowledge of DAX, the language of Power Pivot, Power BI Desktop modelling and SSAS Tabular, and in this book they have documented everything that they know about it down to the smallest detail. Want to know what the KeepFilters() function does? Or the GenerateAll() function? How about all the new DAX functions and features in the latest versions of Power BI Desktop which will also appear in SSAS 2016 Tabular? They’re all here, and more. As such this is essential purchase for anyone doing serious work on the Microsoft BI platform, although probably more as a reference than a book to read end-to-end. It’s fair to say there’s a certain amount of overlap between this and some of their previous books on Power Pivot and SSAS Tabular, but the language – and the community’s understanding of it – has evolved sufficiently to justify buying this book too.

[I received a free copy of this book for review]

Buy it here from Amazon UK | US

‘M’ Is For Data Monkey

As the author of the only other book on Power Query, I suppose I should really be keeping quiet about “’M’ Is For Data Monkey” in case you buy it instead of mine. However 18 months of UI changes and functionality improvements mean my book is now a bit out-of-date, and what’s more important is that Ken Puls and Miguel Escobar have had the advantage of a lot of real-world experience with Power Query that I didn’t have (indeed no-one had) when I was writing in early 2014. The book itself is not a formal introduction to the M language but a guide to what you can do with it in Power Query; while a lot of what’s here will be useful in Power BI this is definitely a Power Query book and the target audience is Excel Pros rather than BI Pros. The decision to focus on Excel Pros was a good one to make, in my opinion, because it plays to the authors’ strengths and means that the book has a very practical focus. A lot of the tips and tricks here are ones I’ve used successfully myself, and I don’t mind admitting that I learned one or two things from this book as well.

Buy it here from Amazon UK | US

Other Books Are Available…

There are a couple of other new books out that, although I haven’t seen them, will also be worth checking out. Rob Collie has just released Power Pivot and Power BI, essentially the second edition of DAX Formulas For Power Pivot; Matt Allington has just released Learn To Write DAX; both are going to be good choices for Excel users wanting a DAX tutorial. Finally, last week Teo Lachev announced on his blog that he has published the world’s first dedicated Power BI book. Teo is another author whose books I admire so I’m sure it will be excellent, although I’ll be interested to see how he handles the problem of writing about a product that changes so much so quickly.

Power BI Desktop, Sort By Column And DAX Calculations That Use The All() Function

Recently I came across a problem where a DAX measure gave different results in Excel and Power BI Desktop. It turned out not to be a bug or even a feature, but since it confused me for a few minutes I thought it was worth writing up in case anyone else ran into it.

Consider a model built in Excel using Power Pivot from the following two tables:

image

image

With two measures defined in the model as follows:

Sum of Sales:=SUM(Sales[Sales])

Share:=DIVIDE([Sum of Sales], CALCULATE([Sum of Sales], ALL(Month[Month Name])))

…and, importantly, the Sort By Column property on the Month Name column set to Month Number:

image

…it’s possible to build a PivotTable that looks like this:

image

However, when you import the same model into Power BI Desktop and recreate the PivotTable above in the Report view you’ll see that the Share calculation no longer gives the same values:

image

What’s the problem here? It’s all down to the way Power BI Desktop generates DAX queries when you set the Sort By Column property. The Excel PivotTable above generates the following MDX:

SELECT 
{[Measures].[Sum of Sales],[Measures].[Share]} 
DIMENSION PROPERTIES PARENT_UNIQUE_NAME,MEMBER_VALUE,
HIERARCHY_UNIQUE_NAME ON COLUMNS , 
NON EMPTY 
Hierarchize(
{DrilldownLevel({[Month].[Month Name].[All]},,,INCLUDE_CALC_MEMBERS)}) 
DIMENSION PROPERTIES PARENT_UNIQUE_NAME,MEMBER_VALUE,
HIERARCHY_UNIQUE_NAME ON ROWS  
FROM [Model] 
CELL PROPERTIES VALUE, FORMAT_STRING, LANGUAGE, 
BACK_COLOR, FORE_COLOR, FONT_FLAGS

On the rows axis, as you would expect, the only hierarchy you see is Month Name.

However, if you run a Profiler trace (you can find out how to do this here although it’s much easier to get the Process ID from DAX Studio) to look at the DAX query generated by Power BI you’ll see

EVALUATE
TOPN (
    102,
    SUMMARIZECOLUMNS (
        ROLLUPADDISSUBTOTAL (
            ROLLUPGROUP ( 'Month'[Month Name], 'Month'[Month Number] ), 
            "IsGrandTotalRowTotal"
        ),
        "Share", 'Sales'[Share],
        "Sum_of_Sales", 'Sales'[Sum of Sales]
    ),
    [IsGrandTotalRowTotal], 0,
    'Month'[Month Number], 1,
    'Month'[Month Name], 1
)
ORDER BY
    [IsGrandTotalRowTotal] DESC,
    'Month'[Month Number],
    'Month'[Month Name]

The difference here is that the Month Number and Month Name fields are both present – they have to be since the query has to sort by Month Number. In MDX the order of members on a hierarchy can be set inside the model; in a DAX query you can only sort using an ORDER BY clause and for that to work, the field you’re ordering by must be present in the query.

The Share measure calculation needs to be changed in order to fix this, then. Here’s one way of doing this:

Share =

DIVIDE([Sum of Sales],

CALCULATE([Sum of Sales], ALL(Month[Month Name], ‘Month'[Month Number])))

Calculated Tables In Power BI

There are loads of great new features in today’s release of Power BI Desktop, but for me the most important by far is the introduction of calculated tables. Miguel Llopis gives a good introduction to what they are in the post announcing the release, but I thought it was worth going into a bit more detail about what they are and why they’re so useful.

What are calculated tables?

Calculated tables are tables in the Data Model whose data source is a DAX expression that returns a table. Here’s a simple example. Imagine that you have already imported the DimDate dimension table from the Adventure Works DW database into your Power BI Data Model. If you go to the Data tab you would see the contents of that table shown, and on the ribbon you can see the new New Table button:

image

Clicking the button allows you to enter a new table name and a DAX expression that returns the table, such as this one that returns a filtered subset of the rows in the DimDate table:

MyFilteredDimDateTable =
FILTER (
    DimDate,
    DimDate[CalendarYear] = 2002
        && DimDate[EnglishMonthName] = "June"
        && DimDate[DayNumberOfMonth] < 5
)

image

Calculated tables are created when the data in the model is refreshed (like calculated columns), and after that behave like any other table – so you can create relationships between calculated tables and other tables. You can also create calculated tables whose DAX expressions reference other calculated tables. They do take up memory like other tables too, so over-using them could be a bad thing.

Why are calculated tables useful?

Miguel’s blog post already lists some of the scenarios where calculated tables are useful, and I can already think of lots of practical scenarios where I’m going to be using them myself.

Role playing dimensions are one obvious use: in a lot of models you need to use the same dimension table more than once in different places, with different relationships and maybe with different filters in place. It might be that you have a single Company dimension in your data warehouse that contains all of the companies your organisation does business with; with calculated tables you only need to import that table once, and you can then use calculated tables to create filtered copies of that table to use as Supplier and Customer dimension tables, joining them to your Stock and Sales fact tables, and only showing the relevant companies in each case.

Certainly for debugging complex DAX expressions they’re going to be handy, because they allow you to see what DAX table expressions return. We’ve already DAX Studio for that but now we don’t have the hassle of switching to another application…!

I can also see calculated tables as a way of doing certain types of ETL – which raises the question of whether you should do a certain operation in Get Data (ie what was Power Query) or using a calculated table. I strongly suspect that a lot of operations are going to be much faster with calculated tables because of the power of the underlying engine. It would be interesting to know if there are plans to allow Get Data to make use of calculated tables, for example as a way of buffering tables in memory, with M transformations folded back to DAX on those tables.

The Calendar() and CalendarAuto() functions

If you were wondering what the new DAX Calendar() and CalendarAuto() functions were for, well, you can probably guess now – Date tables. The Calendar() function returns a table of continuous dates within a given range. So, the expression

CalendarDemo =
CALENDAR ( "1/1/2015", "2/2/2015" )

Will return a table with one column containing all the dates from January 1st 2015 to February 2nd 2015:

image

The CalendarAuto() function looks at all of the Date columns in all of the other tables in the model, and returns a similar table but one where the first date is the beginning of the year that contains the earliest date found in any non-calculated column in any non-calculated table, and where the last date is the end of the year that contains the latest date found in any non-calculated column in any non-calculated table. By default the beginning of the year is January 1st and the end of the year is December 31st, but there’s also an optional parameter to specify a different month to end the year on, if you want to create a fiscal calendar table.

I wonder if we’ll get something like the functionality in Excel 2016 for creating date tables with other useful columns?

Running Your Own MDX And DAX Queries In Power BI Desktop

Every time there’s a new release of Power Query or Power BI Desktop, I always check to see if there are any interesting new M functions that have been added (I used #shared to do this, as detailed here). For the RTM version of Power BI Desktop I spotted two new functions:

image

As well as ODBC connections, we can now use OLEDB and ADO.NET data sources – although they aren’t shown in the UI yet. And you know what this means… with an OLEDB connection we can now run our own MDX and DAX queries against SSAS data sources! I assume this will be coming in Power Query in Excel soon too.

Here’s an example query showing how to use OleDB.Query() to run an MDX query against the Adventure Works DW cube in SSAS Multidimesional:

let
    Source = OleDb.Query(
              "Provider=MSOLAP.5;Data Source=localhost;
               Initial Catalog=Adventure Works DW 2008", 
              "select {measures.[internet sales amount]} on 0, 
               [date].[calendar].[calendar year].members on 1 
               from [adventure works]"
             )
in
    Source

As you can see, it’s pretty straightforward: you just need to supply a connection string and a query. You will need to tell Power BI Desktop which credentials to use when running the query the first time you connect to SSAS, and that’s probably going to be Windows:

image

You will also see a prompt the first time you run the query, asking for permission to run a Native Database Query:

image

This prompt will appear each time a different MDX query is run; you can turn off this prompt in the Options dialog on the Security tab by unchecking the Require user approval for new native database queries box:

image

Here’s the output of the MDX query from the example code:

image

One-To-One Relationships In Power BI

My blog post from earlier this year about bidirectional relationships and many-to-many in Power BI sparked a lot of interest. What I didn’t realise at the time is that there’s another new feature (albeit rather less exciting) concerning relationships: you can now create one-to-one relationships between tables.

For example, consider the following two tables:

image

Both contain a column called Fruit containing the same, distinct set of values. If you load both these tables into Power BI Desktop, create a relationship between them and make sure the Cardinality is set to 1:1 and Cross Filter Direction to Both, like so:

onetoone

…then not only do you get bi-directional cross-filtering (ie if I select something from Fruit1 it will filter the Fruit2 table, and if I select something from Fruit2 it will filter Fruit1) but the Related() and RelatedTable() functions can be used in a DAX calculated column on either table to look up values in the other. With one-to-many relationships, you can only use Related() in a calculated column on the ‘many’ side of the relationship and RelatedTable() on the ‘one’ side of the relationship.

Thanks to Marius Dumitru for pointing this out to me!

Using DateDiff() To Calculate Time Intervals In DAX

One of the most useful new additions to DAX in Excel 2016 and the Power BI Designer is the DateDiff() function. It does exactly what you would expect: calculate the amount of time in between two dates, and express that value as either seconds, minutes, hours, days, weeks, months, quarters or years.

Here’s a very simple table of dates:

image

With this table loaded into the Power BI Designer, you can add new calculated columns to the table by clicking the New Column button on the ribbon. Here are two calculated column definitions that give the number of days and the number of years between the Start Date and the End Date on each row:

DayDurationColumn = 
DATEDIFF(MyTable[Start Date], MyTable[End Date], DAY)

YearDurationColumn = 
DATEDIFF(MyTable[Start Date], MyTable[End Date], YEAR) 

The output is pretty much what you’d expect:

image

It is of course also possible to create measures that use the DateDiff() function to, for example:

YearDurationMeasure = 
DATEDIFF(
FIRSTDATE(MyTable[Start Date]), 
LASTDATE(MyTable[End Date]), 
YEAR)

image

All very straightforward, then, and much easier than having to calculate these values yourself.

Using SelectColumns() To Alias Columns In DAX

A few years ago I wrote this post on how to alias columns in a table in DAX, using a combination of AddColumns() and Summarize(). The good news is that in Excel 2016/the Power BI Designer/SSAS Tabular 2016 there’s a new DAX function specifically for this purpose: SelectColumns(). Here’s an example of how it can be used:

Imagine you have the following source table, called Products:

image

You can write a DAX query to get all the rows and columns from this table like so:

EVALUATE Products

Here’s the output of that query in DAX Studio (and remember, DAX Studio can connect to data loaded into the Power BI Designer, which is what I’m doing here):

image

You can alias the columns in this table using SelectColumns() very easily, like so:

EVALUATE
SELECTCOLUMNS (
    Products,
    "Column One", Products[Product],
    "Column Two", Products[Colour]
)

Here’s the output:

image

The syntax for SelectColumns() is straightforward: the first parameter is a table expression, and after that there are pairs of parameters consisting of:

  • A new column name
  • An expression returning a column from the table given in the first parameter

As you can see in the output of the query above, I’ve renamed the Product column “Column One” and the Colour column “Column Two”.

This means I can now crossjoin a table with itself without needing to worry about conflicting column names, like so:

EVALUATE
CROSSJOIN (
    Products,
    SELECTCOLUMNS (
        Products,
        "Column One", Products[Product],
        "Column Two", Products[Colour]
    )
)

image

One other interesting thing to note about SelectColumns() is that it allows you to do projection in a DAX query easily – as Marco notes here, it was possible before but it wasn’t pleasant. For example, the query:

EVALUATE
SELECTCOLUMNS (
    Products,
    "Just Colour", Products[Colour]
)

Returns:

image

Notice how there are three rows in the output here and that the value Green occurs twice. If you’re a true DAX afficionado, you might get excited about that.