The Second Edition Of “The Definitive Guide To DAX” Is Out!

If you’re a Power BI fan there are three possible answers to the question “Did you know that the second edition of The Definitive Guide To DAX has just been published?”:

Answer#1: Yup, I’ve already got my copy!

If this is your answer there’s no need to read any further.

Answer #2: What’s “The Definitive Guide To DAX”?

If, on the other hand, you’re new to Power BI and this is what you’re thinking then I should explain that “The Definitive Guide To DAX” is a book by Marco Russo and Alberto Ferrari and is what its title suggests it is – the sum total of human knowledge about the DAX calculation and query language used by Power BI, written by the two people who know most about it outside the development team. Marco and Alberto are friends of mine but I don’t think anyone can accuse me of bias when I say that it’s a book that every Power BI developer needs to own, so go out and buy it! If you use Power BI you need to learn DAX and while this book may not be a simple step-by-step tutorial it has in it somewhere answers to just about every question you’ll ever ask about DAX – and, more importantly, the answers it has are as correct and as up-to-date as they possibly can be. I can tell you that it’s proved invaluable to me in my work at least twice in the last week alone.

Answer #3: Yes, I saw that but I already have the first edition – is it worth buying this one too?

This is a slightly more difficult question to answer, but I’m still going to recommend that you buy the second edition. As Marco says in his announcement blog post, a lot of the existing content has been updated and rewritten and a lot of new content has been added. If you care about following all the latest DAX best practices and you don’t want the new hire in your department to mock you because you’ve never heard of DAX Studio, you need to buy this new edition.

[Note: I didn’t get a free copy of this book for review (yet?) but I have an O’Reilly Online Learning account which means I could read it as soon as it was published]

PS I know someone needs to write the “Definitive Guide to M” but it’s not going to be me, at least not right now.

DAX Median() Function Does Not Work On Tables With More Than 2 Billion Rows

An interesting – if obscure – fact I learned recently is that a small number of DAX functions such as Median() do not work on tables with more than 2 billion rows in Analysis Services Tabular, Azure AS and Power BI.

It’s quite easy to reproduce in Power BI. The following M expression returns a table with two billion and four rows:

    Source = 
    #"Converted to Table" = 
    #"Changed Type" = 
        #"Converted to Table",
        {{"Column1", Int64.Type}}
    #"Changed Type"

It takes some time to load this table  – around twenty minutes – but because there are only four distinct values in the table the resulting .pbix file is only 31KB thanks to the way Power BI compresses data.

If you load this table into your dataset, call it VeryBigTable and create the following measure:

Median Test = MEDIAN(VeryBigTable[Column1])

…and use the measure in a visual, you’ll see the following error:


The current query cannot be evaluated for the ‘VeryBigTable (42)’ table, because the table contains more than two billion rows.

What’s more, the error will always occur even if you apply a filter to the table that returns less than two billion rows. The same problem occurs with some other functions, such as Percentile(), but it’s worth pointing out that the vast majority of DAX functions work as normal with tables with more than two billion rows – for example, in the pbix file used here the Sum() and CountRows() functions not only work fine but return instantly.

Luckily, in the case of the Median() function, there is an easy workaround because you can calculate a median in other ways such as the one described on the DAX Patterns site here. The code is a lot more verbose but it works on a 2 billion+ row table.


Storing Large Images In Power BI Datasets

Jason Thomas and Gerhard Brueckl have both blogged on the subject of storing images as text inside a Power BI dataset:

Since they wrote those posts, however, Power BI has added the ability to set the Data Category property on measures as well as columns in tables. This means it is now possible to have the output of a DAX measure displayed as an image in a Power BI report and this in turn opens up a lot of new possibilities – including the ability to work around the maximum size of a text value that can be loaded into Power BI (see my previous blog post for more details) and therefore work with larger images.

Here’s a rather lovely picture of a rose:

2014-08-18 17.02.10_20Pct

The original is about 2.1MB; I have a folder on my PC where different versions of this picture, saved at different percentages of the original size, are stored:


Using the technique that Gerhard wrote about, where the pictures can be stored as text in a single cell in a Power BI dataset and then displayed (in this case I’m using the Image by CloudScope custom visual) some truncation of the image occurs even with the smallest files because of the 32766 character limit on the length of a text value that can be loaded into Power BI. Here’s what you see when you display the version of the picture that is 20% of the original size, a file of only 113KB:


To work around this, what you need to do is to split the text representation of the image up into multiple smaller text values stored across multiple rows, each of which is less than the 32766 character limit, and then reassemble them in a DAX measure after the data has been loaded.

Splitting the text up in M is actually not that hard, but it is hard to do efficiently. Here’s an example of an M query that reads all the data from all of the files in the folder above and returns a table:

    //Get list of files in folder
    Source = Folder.Files("C:\Users\Chris\Documents\PQ Pics"),
    //Remove unnecessary columns
    RemoveOtherColumns = Table.SelectColumns(Source,{"Content", "Name"}),
    //Creates Splitter function
    SplitTextFunction = Splitter.SplitTextByRepeatedLengths(30000),
    //Converts table of files to list
    ListInput = Table.ToRows(RemoveOtherColumns),
    //Function to convert binary of photo to multiple
    //text values
    ConvertOneFile = (InputRow as list) =>
            BinaryIn = InputRow{0},
            FileName = InputRow{1},
            BinaryText = Binary.ToText(BinaryIn, BinaryEncoding.Base64),
            SplitUpText = SplitTextFunction(BinaryText),
            AddFileName = List.Transform(SplitUpText, each {FileName,_})
    //Loops over all photos and calls the above function
    ConvertAllFiles = List.Transform(ListInput, each ConvertOneFile(_)),
    //Combines lists together
    CombineLists = List.Combine(ConvertAllFiles),
    //Converts results to table
    ToTable = #table(type table[Name=text,Pic=text],CombineLists),
    //Adds index column to output table
    AddIndexColumn = Table.AddIndexColumn(ToTable, "Index", 0, 1)

In my next post I’ll show you my original version of this query, explain why it was slow, and try to explain how the version above works and why it is much faster.

Here’s what the query above returns:


The Pic column contains the split text values, each of which are less than the 32766 character limit, so when this table is loaded into Power BI no truncation occurs. The index column is necessary because without it we won’t be able to recombine all the split values in the correct order.

The only thing left to do is to create a measure that uses the DAX ConcatenateX() function to concatenate all of the pieces of text back into a single value, like so:

Display Image = 
    HASONEVALUE('PQ Pics'[Name]),
    "data:image/jpeg;base64, " & 
        'PQ Pics', 
        'PQ Pics'[Pic],
        'PQ Pics'[Index],

…set the data category of this measure to be “Image URL”:


…and then display the value of the image in a report:



Unfortunately, as I also mentioned in my previous post, most DAX functions (and that includes ConcatenateX()) have a limit of around 2.1 million characters so the original 2.1MB file still can’t be displayed, alas:


However, I do think this technique will be useful because it allows you to work with much larger pictures than before.

It can also be useful in other situations too. I recently came across a great new custom visual called PDF Viewer that can display PDF files stored in text form in a Power BI report:


The example file for this visual shows how a large PDF file can be split across two columns in a table; the technique I describe here is a more practical solution to this problem.

What Is The Maximum Length Of A Text Value In Power BI?

What is the maximum length of a text value in Power BI? It turns out that this is a more complex question than you might think!

The maximum length of a text value that the Power Query engine can load into a single cell in a table in a dataset is 32766 characters – any more than that and the text will be silently truncated. However, if you’re working with text inside the Power Query engine you’ll find that you can work with much longer text values.  To illustrate this, consider the following M query:

    Source = 
            type table[charcount = number],
    #"Added Custom" = 
            each Text.Repeat("1", [charcount]),
            type text
    #"Inserted Text Length" = 
            #"Added Custom", 
            each Text.Length([LongText]), 
    #"Inserted Text Length"

It creates a table with four rows and three columns. The first column contains the numbers 1, 10000, 30000 and 40000; the second column contains the character “1” repeated the number of times given in the first column; the third column returns the length of the text in the second column using the Text.Length() M function. Here’s the output in the Power Query Editor, which is pretty much as you’d expect:


I’m not sure if there is a maximum length for text values in M; I experimented with adding an extra row to the table above with a 900,000,000 character text value and Text.Length() was able to return the correct value, albeit after a bit of a wait.

Load the table above into your Power BI dataset though, and add a DAX calculated column with the following expression:

DAX Length = LEN('LengthsDemo'[LongText])

…and you can see in the Data pane of the main Power BI Desktop window that the long text value in the last row has been truncated to 32766 characters:


Once you’ve loaded your data into Power BI the documentation says that the maximum length of a text value is “268,435,456 Unicode characters (256 mega characters) or 536,870,912 bytes”. The bad news is that many DAX functions such as ConcatenateX() put a limit on the length of the text values that they can work with at around 2.1 million characters (thank you Jeffrey Wang for providing this information – it isn’t documented anywhere at the moment). If you exceed this limit you’ll get the following error:

Function ‘PLACEHOLDER’ encountered a Text that exceeds the maximum allowable length.

In summary, then, there are two different practical limits on the maximum length of a text value in Power BI: the 32766 character limit on text being loaded into Power BI, and the 2.1 million character limit in DAX functions. The first of these can be worked around with some clever M – you need to split long text values up into multiple smaller values stored in different columns or rows – but even if you do this, the second limit may stop you recreating the original value after the data has been loaded.

Why is this useful or important? How can you split text values up in M in the most efficient way? I’ll come to that in my next two posts!

DAX Machine Learning Functionality Used By The Key Influencers Visual In Power BI

I’m one of those people who can’t resist peeking behind the scenes, and so when the Key Influencers visual appeared in Power BI I couldn’t help wondering how it worked its machine learning magic. Using DAX Studio to look at the DAX queries generated by the visual proved to be very revealing: it turns out that it uses a number of new DAX functions that are undocumented and probably not meant to be used outside Microsoft. For example, the following screenshot shows a DAX query generated by the Key Influencers visual that uses functions called AI.SampleStratified, AI.Train, AI.KeyDrivers and AI.ExtractProfileFilters:


Using Profiler (in a similar way to what I describe in this post) to go into even more detail about what happens when these queries run, shows that they raise the DAX Extension events that I’ve been wondering about for a long time now:



So Power BI can train and query machine learning models inside its own database engine – which, when you think about it, is pretty darned cool. And then I thought: hold on, other visuals have had machine learning features for a long time. For example, the Line Chart visual can create forecasts, but although DAX Studio shows yet another undocumented function called SampleAxisWithLocalMinMax() this does not actually seem to perform the forecasting, which I assume must be done inside the code of the visual itself:


My guess is that the functionality used by the Key Influencers visual is new functionality in the engine.

A fascinating insight into how Power BI works, but is this any practical use to us? Let me be clear: I don’t think you should be using any of these functions yourself in a real-world report. I’m sure all this would be documented and publicised if Microsoft did want us to use it ourselves! Another consideration is that these new functions return tables and that makes them awkward to use in regular .pbix Power BI reports – I guess we could create calculated tables although that’s not as flexible as returning a table from a query as shown above. That said, even though we can’t write our own DAX queries in regular Power BI reports, we can write our own DAX queries in Paginated Reports and we can now create Paginated Reports that use a Power BI dataset as a data source. I tested putting one of the queries generated by the Key Influencers visual into a Paginated Report connected to the same dataset and it worked ok (even after publishing). You can also embed DAX queries connected to a published dataset in Excel too, as I show here. Hmm, plenty to think about then…

SSAS Tabular 2019, Calculation Groups And Migration From SSAS Multidimensional

With the release of CTP 2.3 of SQL Server 2019 today there was big news for Analysis Services Tabular developers: Calculation Groups. You can read all about them in detail in this blog post:

In my opinion this is the most important new feature in DAX since… well, forever. It allows you to create a new type of calculation – which in most cases will be a time intelligence like a year-to-date or a previous period growth – that can be applied to multiple measures; basically the same thing that we have been doing in SSAS Multidimensional for years with the time utility/shell/date tool dimension technique. It’s certainly going to solve a lot of problems for a lot of SSAS Tabular implementations, many of which have hundreds or even thousands of measures for every combination of base measure and calculation type needed.

I’m not going to repeat any of the detailed technical information in the blog post here, though. Instead the point I want to make is that this is very big news for SSAS Multidimensional users too. In the past couple of years many people who have existing SSAS Multidimensional implementations have thought about migrating to SSAS Tabular so they can take advantage of its new features or move to the cloud, and indeed many of them have already migrated successfully. However, up to now, the biggest blocker for those wanting to migrate from Multidimensional to Tabular has been the fact that some complex calculations that can be expressed in MDX cannot be recreated (or recreated easily and efficiently) in DAX, because DAX has not had an equivalent of calculated members not on the Measures dimension or the MDX SCOPE statement.

Calculation groups do not remove this roadblock completely, but they do remove the roadblock for a large group of existing SSAS Multidimensional users whose only complex calculation requirement is a time utility/shell/date tool dimension. As a result these SSAS Multidimensional users will now be able to migrate to SSAS Tabular 2019, Azure Analysis Services or Power BI if they want to. Only those people who have more exotic uses for calculated members not on the Measures dimension (which are not very common at all) and those who use SCOPE statements (a larger group – many people working with financial data use SCOPE statements heavily) will find that Multidimensional is still the correct platform for them.

The Is Nullable Column Property In Power BI

The new Relationships view (or Modeling view – it seems to have two names) in Power BI Desktop that has been in preview since November 2018 not only makes it easier to work with complex models and set properties more easily, it also exposes a brand new property on a column: the “Is nullable” property. It’s visible at the bottom of the new Properties pane when you click on a column:


I was, naturally, curious about what it did and I couldn’t find any documentation so I did a bit of investigation of my own and asked a few people at Microsoft.

It turns out that it is primarily intended for validation purposes, so that if you know a column should never contain a null value and then, at a later date, a null value does appear in that column then you’ll get the following error when you try to refresh a table in Import mode:


Column ‘MyColumn’ in Table ‘TestTable’ contains blank values and this is not allowed for columns on the one side of a many-to-one relationship or for columns that are used as the primary key of a table

The only other use I have found for it is when it is used in combination with the DAX CombineValues() function in DirectQuery mode. In his excellent article on this subject, Marco notes that the SQL generated when you use CombineValues() to concatenate values from multiple columns and create a calculated column that can be used in a relationship includes a check for null values in those columns. I can confirm that if you turn off the ‘is nullable’ property on all the columns used by CombineValues(), there are no checks for null values in the SQL queries.

There may be a few other things that it does, but I’ve been told they are likely to be there to ensure correctness rather than performance optimisations. Nonetheless if you do find other scenarios where it has an effect please let me know in a comment.

[Thanks to Akshai Mirchandani and Jeffrey Wang for providing much of the information used in this post]

Complex Filter Conditions In Power BI Reports Using Visual-Level Filters And Measures

While Power BI has a lot of great functionality for filtering the data that appears in a visual using visual-level filters there are going to be situations where you need more control over how filtering works, and in this blog post I’m going to show you how you can use visual-level filters and measures in combination to achieve this. The example I show here may be quite specific but I think the general technique has a lot of other interesting applications.

Let’s say you have a table containing customer purchases, with potentially more than one purchase per customer, and want to display a table in Power BI showing only the most recent purchase for each customer relative to a given date.

Using data from the Adventure Works DW database, here’s the Internet Sales table from my Power BI example dataset that contains the customer purchases. Each purchase has a Sales Order Number and a Sales Order Line Number, and there may be multiple line items in a single sales order.


Here’s some data from this table, filtered down to show the purchases of one customer: Abby Subram.


[Note: all dates shown are in dd/mm/yyyy format]

The requirement is that the report user should be able to select a date in a slicer and show the details for the most recent sales order that occurs before the selected date. So, for example, if the user selects 1/1/2003 then no rows should appear; if 16/6/2003 is selected then only SO50934 should appear; if 30/9/2003 is selected only the four rows relating to SO54975 (and not SO50934) should appear; if 28/6/2004 is selected only the four rows relating to SO73938 should appear, and so on.

The first step is to create a create a date table that contains the ‘as of’ dates the user can select from, using the disconnected slicer technique: basically a date table that has no relationship at all to the Internet Sales table shown above.


Next comes the tricky part. Visual-level filters can be used to filter the contents of visuals in a Power BI report based on the value of a measure, and that measure does not need to be displayed in a report. What you therefore have to do is create a measure that will return the value 1 for data that should be displayed (in this case, the most recent purchase order for the customer) and blank otherwise, then apply a visual-level filter using this measure and filter on this measure equalling 1. Marco and Alberto have a great article going into the details of how measures behave when they are used in visual-level filters here that I recommend you read before carrying on.

For this particular scenario here’s my measure:

Is Latest SO =
VAR CurrentAsOfDate =
    SELECTEDVALUE ( 'As Of Date'[DateKey] )
VAR CurrentSODate =
    SELECTEDVALUE ( 'Internet Sales'[OrderDateKey] )
VAR CurrentCustomer =
    SELECTEDVALUE ( 'Internet Sales'[CustomerKey] )
VAR SameCustomerSOs =
    FILTER (
        ALL ( 'Internet Sales'[CustomerKey], 'Internet Sales'[OrderDateKey] ),
        'Internet Sales'[CustomerKey] = CurrentCustomer
            && 'Internet Sales'[OrderDateKey] <= CurrentAsOfDate
            && 'Internet Sales'[OrderDateKey] > CurrentSODate
VAR NoLaterSOs =
    IF ( COUNTROWS ( SameCustomerSOs ) > 0, BLANK (), 1 )
    IF ( CurrentSODate <= CurrentAsOfDate, NoLaterSOs )

The logic here is:

  • Assume that this measure will be used to filter a table visual in a Power BI report, where each row in the table visual displays data from a single row in the Internet Sales table in the dataset.
  • For each row in the Internet Sales table, find the current values for OrderDateKey and CustomerKey, as well as the date selected in the ‘as of date’ slicer, and store them in variables
  • Create a table of all possible combinations of CustomerKey and OrderDateKey ignoring the current filter context, then filter it so you get all the sales orders for the current customer and where the order date is greater than the date of the current sales order and less than or equal to the selected ‘as of’ date. This is stored in the SameCustomerSOs variable.
  • If there are rows in the SameCustomerSOs table then the current row in Internet Sales should not be displayed because it is not the most recent sales order as of the selected date; if SameCustomerSOs is empty then the current row should be displayed, because there are not later sales orders for the current customer before the selected as of date.
  • Finally, return 1 if SameCustomerSOs is blank and if the current row in Internet Sales has an order date before the selected as of date.

You can then create a table (or some other visual) in your Power BI report and, in the visual-level filters for that table, drag in the measure and apply a filter on the value of the measure equalling 1:


Here’s the result, showing the data for Abby Subram:



The one thing you do need to be careful of when using this technique is that the DAX you use in your measure must be as efficient as possible: inefficient DAX, complex filters and large data volumes will make your report unusably slow. I’m sure the DAX used in my measure above could be tuned to perform better (it currently takes around half a second to filter the full 60,000 rows in the Internet Sales table; Power BI only queries for and displays 500 rows at a time, which makes things faster) but the real point I wanted to make here is that using measures and visual-level filters in this way allows you to apply almost any complex filter condition you want to the data displayed in a visual.

You can download the sample pbix file for this post here.

Creating Sparklines And Small Multiples In Power BI Using The Google Image Charts API

The excellent work that David Eldersveld has been doing recently on using SVG images in Power BI has generated a lot of interest; now that the August 2018 release of Power BI Desktop allows you to set data categories on measures – and specifically the Image URL data category – David has been able to apply his work to the creation of sparklines and small multiples in Power BI. He’s already published an example in the Quick Measures gallery here; you might also want to check out this blog post and quick measure by Tom Martens too.

All this got me thinking: what if there was a simple web service that could generate charts, one that didn’t need any authentication, could work with simple GET requests and was free to use? Well guess what, there is one created by Google: the Google Image Charts API. It turns out that it makes it super easy to create sparklines and small multiples in Power BI with very little DAX needed.

Here are two examples of what’s possible with some simple DAX in a measure:



Both examples use data from a table called Sales with the following contents:


The first example, the Sales Sparkline measure, has the following definition:

Sales Sparkline = 
var BaseURL = 
var ChartType = 
var ChartScale = 
var ChartSize = 
var SeriesColour = 
var ChartData = 
    "&chd=t:" & 
    CONCATENATEX(Sales, Sales[Sales], ",", Sales[Month])
        BaseURL & ChartType & ChartScale & 
	ChartSize & SeriesColour & ChartData

All I’m doing is building up the URL to call the API from the parameters listed here. The second measure is basically the same as the first, but the chart type is driven by a selection in a slicer. It’s not bullet-proof code but it shows how easy it is to use the API.

You can see the report live here, and download the sample .pbix file here.

I’m not going to pretend that I’m a data visualisation expert and I know these examples are very basic; however, I think there are a lot of possibilities here for someone with better data visualisation skills than me.

Now for the bad news: the Google Image Charts API has been deprecated since 2012! Here’s the statement on the Google website:

While the dynamic and interactive Google Charts are actively maintained, we officially deprecated the static Google Image Charts way back in 2012. This gives us the right to turn it off without notice, although we have no plans to do so.

Also, there’s a fair usage policy too:

There’s no limit to the number of calls per day you can make to the Google Chart API. However, we reserve the right to block any use that we regard as abusive.

As a result I would not be confident about using it in production. If thousands of Power BI users started using the API it might prompt Google to block traffic from Power BI or turn off the API completely!

I do think Microsoft really needs to address the lack of native support for sparklines and small multiples in Power BI – the fact we still don’t have them is frankly embarrassing (even Power View had them, for crying out loud). Although one solution could be a DAX function that returned a chart image – basically taking David’s work and wrapping it up in a more user-friendly form – another approach would be for Microsoft to build on the Report Page Tooltips functionality. Since we can already design mini reports and make them appear as tooltips, why not make it possible for the same mini reports to appear in a cell in a table or matrix? You could then have measures that returned the name of a report page as a text value, and a new data category for them that tells Power BI to display the contents of the report page in the cell instead of the text. That way you could create small multiples from any visual or set of visuals.

UPDATE: thanks also to Tom Martens for reminding me to point out that this technique involves sending data to a third party, something which might cause legal problems for you.

More Details On The RowsetSerializationLimit Server Property For Analysis Services

The recent blog post on the Analysis Services team blog about new memory options contained information about some very interesting functionality that has just been added. The first of the new features I decided to try was the RowsetSerializationLimit server property, which restricts the number of rows returned by a query and which is intended to stop rogue users running queries that return very large amounts of data. It certainly works but there are two things to point out about it that aren’t immediately obvious.

First of all, an example of it in action. In Azure Analysis Services, using a database that contains one very simple table with one column and 5000 rows, the following DAX query returns all 5000 rows from the table:



To change the RowsetSerializationLimit server property, open SQL Server Management Studio, right-click on your server name in the Object Explorer pane and select Properties. Then go to the General page, check the Show Advanced (All) Properties box, and you’ll see the property listed under OLAP\Query\RowsetSerializationLimit:


The default value is –1, meaning no limit is placed on the number of rows returned by a rowset. Changing it to 100 and then running the above query results in the following error:


Executing the query …
Obtained object of type: Microsoft.AnalysisServices.AdomdClient.AdomdDataReader
The maximum number of rows (100) was reached.

A query that returns less than 99 rows, for example


…is ok though.

The important thing to understand is that although this applies to both DAX and MDX queries, it only applies to queries that are returned as rowsets (a tabular format) and not as cellsets (a multidimensional format). Most client tools that generate MDX, including Excel, use cellsets so this property will not apply to them unfortunately.

For example, the following MDX query run from a SQL Server Management Studio MDX query window (which also returns a cellset) on the same database runs ok and returns 5000 rows:

{[Measures].[Number Of Rows]} ON 0,
[MyTable].[Column1].[Column1].MEMBERS ON 1



However the same query run from Power BI Desktop to import data from the same database:


…runs, but returns only 100 rows and then an error:


Something else to point out is that this applies to all queries that return rowsets, even Discover requests and DMV queries. As a result, setting this property to a very low value can cause problems in tools like SQL Server Management Studio: for example if you set the property to 10 and you had 11 tables in a database, you would see an error when you tried to expand the Tables node of a database in the Object Explorer pane!

Even though this property only applies to rowsets I think it’s still going to be very useful, especially in scenarios where Power BI Desktop users are importing vast amounts of data from Analysis Services and causing major problems on the server.

%d bloggers like this: