Nested Variables In DAX

Last week, at the SQL Server Days conference in Belgium, Kasper mentioned in his presentation that it was possible to define variables inside variables in DAX. So, for example, you could define a measure like so:

MyMeasure = 
var Outer1 = 
               var Inner1 = 1
               var Inner2 = 2
               return Inner1 + Inner2
var Outer2 = 3
return Outer1 + Outer2

This measure returns 6 as you might expect:

image

There aren’t any performance benefits to doing this, although of course it helps with code readability and organisation (thanks to Marius for confirming this).

With my newly rekindled love of DAX I thought this was quite interesting. I’m not really sure why though, given that it’s not particularly useful; I think Matt might be right:

image

Defining Variables In DAX Queries

Variables are the best thing to happen to DAX since, well forever – they are so cool I’m almost ready to like DAX as much as I like MDX. There are already several good articles and blog posts out there describing how to use them (see here and here), but I was looking at a Profiler trace the other day and saw something I hadn’t yet realised about them: you can declare and use variables in the DEFINE clause of a DAX query. Since my series of posts on DAX queries still gets a fair amount of traffic, I thought it would be worth writing a brief post showing how this works.

Say you have the following table (called Sales) in your model:

image

You can declare DAX variables in the DEFINE clause of a query like so:

DEFINE
    VAR MyMonth = "January"
    VAR FilteredMonths =
        FILTER ( VALUES ( Sales[Month] ), Sales[Month] = MyMonth )
EVALUATE
CALCULATETABLE ( Sales, FilteredMonths )

This query returns the following result:

image

The benefits of using variables in this way are the same as you get when using variables in measures and calculated columns: improved readability, less repetition and potential performance improvements.

I also wondered whether I would be able to refer to these variables inside measures declared in the DEFINE clause, but unfortunately you can’t. The following query:

DEFINE
    VAR MyMonth = "January"
    VAR FilteredMonths =
        FILTER ( VALUES ( Sales[Month] ), Sales[Month] = MyMonth )
    MEASURE Sales[FirstMeasure] =
        COUNTROWS ( FilteredMonths )
EVALUATE
ROW ( "First Measure", [FirstMeasure] )

…returns the error

“Failed to resolve name ‘FilteredMonths’. It is not a valid table, variable or function name”.

However if you define your calculations inside the query itself, for example using the Summarize() or AddColumns() functions, or like so:

DEFINE
    VAR MyMonth = "January"
    VAR FilteredMonths =
        FILTER ( VALUES ( Sales[Month] ), Sales[Month] = MyMonth )
EVALUATE
ROW (
    "First Calc", COUNTROWS (FilteredMonths),
    "Second Calc", CALCULATE (SUM(Sales[Sales]), FilteredMonths)
)

…the query works:

image

In a lot of cases, multiple calculations contain some of the same logic and being able to use variables to share tables and values between calculations opens up some really interesting opportunities for performance optimisations.

SSAS Multidimensional Caching-Related Performance Problems With Power BI Tables

Last week I was doing some performance tuning for a customer using Power BI on top of a large SSAS Multidimensional cube via a Live connection. Some of their reports were performing particularly badly and I uncovered a problem with the DAX generated by Power BI for tables with a large number of rows, fields from two or more large hierarchies and totals turned on.

The problem is very easy to reproduce; I’m going to do it using a simplified version of the Adventure Works cube that contains only the Date and Product dimensions. Take a look at the following table from a Power BI report:

image

It has the Date attribute from the Date dimension, and the Product attribute from the Product dimension, on rows and two measures on columns. The table has a large number of rows in it (both Date and Product are fairly large hierarchies) and if you look at the DAX generated by Power BI you can see that it only requests the first 501 rows. That’s fine – the real problem is that Power BI also generates a second DAX query to get the two values displayed in the Total line at the bottom of the table. The DAX looks something like this, and is easily identifiable because it uses the Row() function:

EVALUATE
  CALCULATETABLE(
    ROW(
      "Sales_Amount", 'Internet Sales Facts'[Sales Amount],
      "Tax_Amt", 'Internet Sales Facts'[Tax Amt]
    ),
    KEEPFILTERS(
      GENERATE(
        KEEPFILTERS(VALUES('Product'[Product.Key0])),
        CALCULATETABLE(
          FILTER(
            KEEPFILTERS(VALUES('Order Date'[Date.Key0])),
            OR(
              NOT(ISBLANK('Internet Sales Facts'[Sales Amount])),
              NOT(ISBLANK('Internet Sales Facts'[Tax Amt]))
            )
          )
        )
      )
    )
  )

This query has something in it – I don’t know what – that means that it cannot make use of the Analysis Services Storage Engine cache. Every time you run it SSAS will go to disk, read the data that it needs and then aggregate it, which means you’ll get cold-cache performance all the time. On a big cube this can be a big problem. This is very similar to problems I’ve seen with MDX queries on Multidimensional and which I blogged about here; it’s the first time I’ve seen this happen with a DAX query though. I suspect a lot of people using Power BI on SSAS Multidimensional will have this problem without realising it.

This problem does not occur for all tables – as far as I can see it only happens with tables that have a large number of rows and two or more hierarchies in. The easy way to check whether you have this problem is to refresh your report, run a Profiler trace that includes the Progress Report Begin/End and Query Subcube Verbose events (and any others you find useful) and then refresh the report again by pressing the Refresh button in Power BI Desktop without changing it at all. In your trace, if you see any of the Progress Report events appear when that second refresh happens, as well as Query Subcube Verbose events with an Event Subclass of Non-cache data, then you know that the Storage Engine cache is not being used.

image

Also look at the Duration column in the trace for these events which shows the time in milliseconds that they took to execute. This will tell you how much of an impact this problem is having on your report refresh times.

The easy way to stop this happening is to turn off the totals row in the table:

image

Displaying the totals in a separate table also seems to avoid the problem, although of course it doesn’t look as good. Only putting one hierarchy in the table apart from your measures, also seems to solve the problem. You could also try all the usual methods to improve SSAS performance such as building aggregations on the cube.

The dev team is aware of this problem but it’s unlikely to be fixed in the short term.

Dynamic Chart Titles In Power BI

As you probably know, charts (and lots of other visualisations) in Power BI have titles that can be set to any piece of static text. You can do this by selecting the chart, going to the Format tab in the Visualizations pane, then changing the properties in the Title section as shown below (full documentation here):

image

But what about if you want the chart title to change depending on what is selected? For example, you might be using slicers or filters to allow a user to choose which days of the week they want to see data for. In that situation you might want to add a title that shows which days of the week have actually been selected; this would be particularly important if the report uses filters, or if the report is going to be printed. Unfortunately the built in Title Text property can’t be used to display dynamic values but in this blog post I’ll show you how to solve this problem using DAX.

Here’s a simple example of a report that contains a dynamic chart title:

image

Using data from the Adventure Works DW database I’ve created a simple data model containing a Date dimension table called DimDate and a fact table called FactInternetSales; the DimDate table contains a field called EnglishDayNameOfWeek that contains the names of the days of the week, and the report contains a column chart that shows a Sales measure broken down by day of week. There’s also a slicer where the user can select one or more day and at the top there’s a title that lists the day names selected in the slicer and displayed in the chart.

There are two parts to the solution. The first part is to create a measure that will return the text needed for the chart title, and this relies on the DAX ConcatenateX() function that I blogged about here. Here’s the DAX for the measure:

Title =
"Sales Amount for "
    & CONCATENATEX (
        VALUES ( 'DimDate'[EnglishDayNameOfWeek] ),
        'DimDate'[EnglishDayNameOfWeek],
        ", "
    )

Here, the Values() function is used to return a table containing all of the selected days of the week, and this is then passed to ConcatenateX() to get a text value containing a comma delimited list of the day names.

The second part of the solution deals with how to display the value returned by the measure. In the report above I used a Card visualisation, dropped the measure above into the Field area and then turned off the Category Label on the Format tab so that only the value returned by the measure, and not the name of the measure itself, is displayed:

image

image

And this is all you need to do to recreate the report above.

We can make this better though! Instead of a simple comma delimited list of day names it would be better if we could change the last comma in the list to an “and”:

image

Also, if all the day names were displayed, it would be good not to display a long list of day names but show some default text instead:

image

Here’s the DAX for a measure that does all this:

Title2 =
VAR SelectedDays =
    VALUES ( 'DimDate'[EnglishDayNameOfWeek] )
VAR NumberOfSelectedDays =
    COUNTROWS ( SelectedDays )
VAR NumberOfPossibleDays =
    COUNTROWS ( ALL ( 'DimDate'[EnglishDayNameOfWeek] ) )
VAR AllButLastSelectedDay =
    TOPN ( NumberOfSelectedDays - 1, SelectedDays )
VAR LastSelectedDay =
    EXCEPT ( SelectedDays, AllButLastSelectedDay )
RETURN
    "Sales Amount "
        & IF (
            NumberOfSelectedDays = NumberOfPossibleDays,
            "By Day Of Week",
            "For "
                & IF (
                    NumberOfSelectedDays = 1,
                    "",
                    CONCATENATEX ( 
                       AllButLastSelectedDay, 
                       'DimDate'[EnglishDayNameOfWeek], 
                       ", " )
                        & " And "
                )
                & LastSelectedDay
        )

Using a series of DAX variables to make the code more readable, here’s what this measure does:

  • If the number of days selected is the same as the total number of possible days, return the title text “By Day Of Week”, otherwise
    • If two or more days have been selected, then return a comma delimited list containing all but the last selected day (I used TopN() to get that table of all but the last selected day) plus a trailing “ And “. If only one day has been selected, return an empty string. Then
    • Concatenate the last selected day to the text returned by the previous step. I’ve used the Except() function to find the day that was excluded by the TOPN() function in the previous step.

You can download a .pbix file containing all the code from this post here and I’ve published the report here.

Two New Books: “The Definitive Guide To DAX” And “’M’ Is For Data Monkey”

I’m not going to pretend that this blog post is a properly impartial review – I know the authors of both of these books to varying degrees – but I thought it was worth writing a few words on two new books I’ve acquired recently which are worth additions to any Power BI enthusiast’s bookshelf or e-reader.

The Definitive Guide To DAX

Something I’ll never understand about my friends Marco Russo and Alberto Russo is their love of writing books – they generally have a new one out every year, sometimes two (personally I find writing books painful). Their latest publication is “The Definitive Guide To DAX” and it does indeed live up to its title. No-one outside the dev team comes close to Marco and Alberto’s knowledge of DAX, the language of Power Pivot, Power BI Desktop modelling and SSAS Tabular, and in this book they have documented everything that they know about it down to the smallest detail. Want to know what the KeepFilters() function does? Or the GenerateAll() function? How about all the new DAX functions and features in the latest versions of Power BI Desktop which will also appear in SSAS 2016 Tabular? They’re all here, and more. As such this is essential purchase for anyone doing serious work on the Microsoft BI platform, although probably more as a reference than a book to read end-to-end. It’s fair to say there’s a certain amount of overlap between this and some of their previous books on Power Pivot and SSAS Tabular, but the language – and the community’s understanding of it – has evolved sufficiently to justify buying this book too.

[I received a free copy of this book for review]

Buy it here from Amazon UK | US

‘M’ Is For Data Monkey

As the author of the only other book on Power Query, I suppose I should really be keeping quiet about “’M’ Is For Data Monkey” in case you buy it instead of mine. However 18 months of UI changes and functionality improvements mean my book is now a bit out-of-date, and what’s more important is that Ken Puls and Miguel Escobar have had the advantage of a lot of real-world experience with Power Query that I didn’t have (indeed no-one had) when I was writing in early 2014. The book itself is not a formal introduction to the M language but a guide to what you can do with it in Power Query; while a lot of what’s here will be useful in Power BI this is definitely a Power Query book and the target audience is Excel Pros rather than BI Pros. The decision to focus on Excel Pros was a good one to make, in my opinion, because it plays to the authors’ strengths and means that the book has a very practical focus. A lot of the tips and tricks here are ones I’ve used successfully myself, and I don’t mind admitting that I learned one or two things from this book as well.

Buy it here from Amazon UK | US

Other Books Are Available…

There are a couple of other new books out that, although I haven’t seen them, will also be worth checking out. Rob Collie has just released Power Pivot and Power BI, essentially the second edition of DAX Formulas For Power Pivot; Matt Allington has just released Learn To Write DAX; both are going to be good choices for Excel users wanting a DAX tutorial. Finally, last week Teo Lachev announced on his blog that he has published the world’s first dedicated Power BI book. Teo is another author whose books I admire so I’m sure it will be excellent, although I’ll be interested to see how he handles the problem of writing about a product that changes so much so quickly.

Power BI Desktop, Sort By Column And DAX Calculations That Use The All() Function

Recently I came across a problem where a DAX measure gave different results in Excel and Power BI Desktop. It turned out not to be a bug or even a feature, but since it confused me for a few minutes I thought it was worth writing up in case anyone else ran into it.

Consider a model built in Excel using Power Pivot from the following two tables:

image

image

With two measures defined in the model as follows:

Sum of Sales:=SUM(Sales[Sales])

Share:=DIVIDE([Sum of Sales], CALCULATE([Sum of Sales], ALL(Month[Month Name])))

…and, importantly, the Sort By Column property on the Month Name column set to Month Number:

image

…it’s possible to build a PivotTable that looks like this:

image

However, when you import the same model into Power BI Desktop and recreate the PivotTable above in the Report view you’ll see that the Share calculation no longer gives the same values:

image

What’s the problem here? It’s all down to the way Power BI Desktop generates DAX queries when you set the Sort By Column property. The Excel PivotTable above generates the following MDX:

SELECT 
{[Measures].[Sum of Sales],[Measures].[Share]} 
DIMENSION PROPERTIES PARENT_UNIQUE_NAME,MEMBER_VALUE,
HIERARCHY_UNIQUE_NAME ON COLUMNS , 
NON EMPTY 
Hierarchize(
{DrilldownLevel({[Month].[Month Name].[All]},,,INCLUDE_CALC_MEMBERS)}) 
DIMENSION PROPERTIES PARENT_UNIQUE_NAME,MEMBER_VALUE,
HIERARCHY_UNIQUE_NAME ON ROWS  
FROM [Model] 
CELL PROPERTIES VALUE, FORMAT_STRING, LANGUAGE, 
BACK_COLOR, FORE_COLOR, FONT_FLAGS

On the rows axis, as you would expect, the only hierarchy you see is Month Name.

However, if you run a Profiler trace (you can find out how to do this here although it’s much easier to get the Process ID from DAX Studio) to look at the DAX query generated by Power BI you’ll see

EVALUATE
TOPN (
    102,
    SUMMARIZECOLUMNS (
        ROLLUPADDISSUBTOTAL (
            ROLLUPGROUP ( 'Month'[Month Name], 'Month'[Month Number] ), 
            "IsGrandTotalRowTotal"
        ),
        "Share", 'Sales'[Share],
        "Sum_of_Sales", 'Sales'[Sum of Sales]
    ),
    [IsGrandTotalRowTotal], 0,
    'Month'[Month Number], 1,
    'Month'[Month Name], 1
)
ORDER BY
    [IsGrandTotalRowTotal] DESC,
    'Month'[Month Number],
    'Month'[Month Name]

The difference here is that the Month Number and Month Name fields are both present – they have to be since the query has to sort by Month Number. In MDX the order of members on a hierarchy can be set inside the model; in a DAX query you can only sort using an ORDER BY clause and for that to work, the field you’re ordering by must be present in the query.

The Share measure calculation needs to be changed in order to fix this, then. Here’s one way of doing this:

Share =

DIVIDE([Sum of Sales],

CALCULATE([Sum of Sales], ALL(Month[Month Name], ‘Month'[Month Number])))

Calculated Tables In Power BI

There are loads of great new features in today’s release of Power BI Desktop, but for me the most important by far is the introduction of calculated tables. Miguel Llopis gives a good introduction to what they are in the post announcing the release, but I thought it was worth going into a bit more detail about what they are and why they’re so useful.

What are calculated tables?

Calculated tables are tables in the Data Model whose data source is a DAX expression that returns a table. Here’s a simple example. Imagine that you have already imported the DimDate dimension table from the Adventure Works DW database into your Power BI Data Model. If you go to the Data tab you would see the contents of that table shown, and on the ribbon you can see the new New Table button:

image

Clicking the button allows you to enter a new table name and a DAX expression that returns the table, such as this one that returns a filtered subset of the rows in the DimDate table:

MyFilteredDimDateTable =
FILTER (
    DimDate,
    DimDate[CalendarYear] = 2002
        && DimDate[EnglishMonthName] = "June"
        && DimDate[DayNumberOfMonth] < 5
)

image

Calculated tables are created when the data in the model is refreshed (like calculated columns), and after that behave like any other table – so you can create relationships between calculated tables and other tables. You can also create calculated tables whose DAX expressions reference other calculated tables. They do take up memory like other tables too, so over-using them could be a bad thing.

Why are calculated tables useful?

Miguel’s blog post already lists some of the scenarios where calculated tables are useful, and I can already think of lots of practical scenarios where I’m going to be using them myself.

Role playing dimensions are one obvious use: in a lot of models you need to use the same dimension table more than once in different places, with different relationships and maybe with different filters in place. It might be that you have a single Company dimension in your data warehouse that contains all of the companies your organisation does business with; with calculated tables you only need to import that table once, and you can then use calculated tables to create filtered copies of that table to use as Supplier and Customer dimension tables, joining them to your Stock and Sales fact tables, and only showing the relevant companies in each case.

Certainly for debugging complex DAX expressions they’re going to be handy, because they allow you to see what DAX table expressions return. We’ve already DAX Studio for that but now we don’t have the hassle of switching to another application…!

I can also see calculated tables as a way of doing certain types of ETL – which raises the question of whether you should do a certain operation in Get Data (ie what was Power Query) or using a calculated table. I strongly suspect that a lot of operations are going to be much faster with calculated tables because of the power of the underlying engine. It would be interesting to know if there are plans to allow Get Data to make use of calculated tables, for example as a way of buffering tables in memory, with M transformations folded back to DAX on those tables.

The Calendar() and CalendarAuto() functions

If you were wondering what the new DAX Calendar() and CalendarAuto() functions were for, well, you can probably guess now – Date tables. The Calendar() function returns a table of continuous dates within a given range. So, the expression

CalendarDemo =
CALENDAR ( "1/1/2015", "2/2/2015" )

Will return a table with one column containing all the dates from January 1st 2015 to February 2nd 2015:

image

The CalendarAuto() function looks at all of the Date columns in all of the other tables in the model, and returns a similar table but one where the first date is the beginning of the year that contains the earliest date found in any non-calculated column in any non-calculated table, and where the last date is the end of the year that contains the latest date found in any non-calculated column in any non-calculated table. By default the beginning of the year is January 1st and the end of the year is December 31st, but there’s also an optional parameter to specify a different month to end the year on, if you want to create a fiscal calendar table.

I wonder if we’ll get something like the functionality in Excel 2016 for creating date tables with other useful columns?