Power BI Report Builder And RSCustomDaxFilter

If you’re building DAX queries using Power BI Report Builder you might notice something that looks like a new DAX function called RSCustomDaxFilter. For example, here’s a simple DAX query built from the Adventure Works Tabular demo database, with one measure and one parameter built Calendar Year:

image

Note that the “Enable Multi Value Parameters” option has been selected. Here’s what you’ll see if you view the text of the DAX query in the Query Designer:

EVALUATE 
SUMMARIZECOLUMNS(
RSCustomDaxFilter(
 @DateCalendarYear,
 EqualToCondition,
 [Date].[Calendar Year],
 Int64
 ), 
"Internet Total Sales", 
[Internet Total Sales]
)

But what is RSCustomDaxFilter? If you run your report and see what happens on the server using SQL Server Profiler, here’s the query that actually gets run (in this case I selected the years 2013 and 2014 for the parameter):

EVALUATE
SUMMARIZECOLUMNS (
    FILTER (
        VALUES ( 'Date'[Calendar Year] ),
        ( 'Date'[Calendar Year] = VALUE ( "2013" ) )
            || ( 'Date'[Calendar Year] = VALUE ( "2014" ) )
    ),
    "Internet Total Sales", [Internet Total Sales]
)

What has happened is that RSCustomDaxFilter has been replaced with an expression using the DAX Filter() function that implements the filter on the selected years; it’s just a placeholder for a dynamically-generated DAX expression that is substituted in at runtime.

Why is it needed? Handling multi-value parameters is difficult in DAX when you don’t know how many values are going to be passed to the parameters (it’s a subject I’ve blogged about here and here) and some kind of dynamic code generation is a reasonable solution to this problem.

There is one drawback with this approach though – it can generate a DAX query that is too long to be executed. Here’s the error message you’ll see:

image

The specified query is too complex to be evaluated as a single statement.

I’m not sure what the maximum length of a query is in DAX – I suspect it’s 32768 characters. You’ll hit this limit if you create a parameter on a field with thousands of values in and then select all but a few of these values; from the example above you can imagine how long the resulting DAX query could be:

image

There’s no single workaround for this but some custom DAX (such as the example here) will be necessary; you’ll just need to ensure that the DAX query is as short as possible.

SSASDiag: A Tool To Help Diagnose Analysis Services Problems

There are a lot of great community-developed tools out there for Analysis Services developers to use (BI Developer Extensions, DAX Studio, Tabular Editor, Analysis Services Query Analyzer to name a few) and they have saved me vast amounts of time and effort over the years. When I joined Microsoft last month I came across one which I had never seen before but which is nevertheless quite mature and feature-rich: the SSAS Diagnostics Tool or SSASDiag for short. It’s available on GitHub here:
https://github.com/ssasdiag/SSASDiag

…and you can read the documentation here:
https://github.com/ssasdiag/SSASDiag/wiki/SSAS-Diagnostics—Analysis

image

It’s an open source tool developed by the people who support Analysis Services here at Microsoft and is intended to help them collect and analyse the information they need to troubleshoot on-premises SSAS  issues, but it’s available for anyone to use. I haven’t had a chance to take a proper look at it yet myself, unfortunately, but I thought it would be interesting for any SSAS fans out there to check out.

[Thanks to Jon Burchel for providing all the background information for this post]

DAX Median() Function Does Not Work On Tables With More Than 2 Billion Rows

An interesting – if obscure – fact I learned recently is that a small number of DAX functions such as Median() do not work on tables with more than 2 billion rows in Analysis Services Tabular, Azure AS and Power BI.

It’s quite easy to reproduce in Power BI. The following M expression returns a table with two billion and four rows:

let
    Source = 
    List.Repeat(
        {1,2,3,4},
        500000001
        ),
    #"Converted to Table" = 
    Table.FromList(
        Source, 
        Splitter.SplitByNothing(), 
        null, 
        null, 
        ExtraValues.Error
        ),
    #"Changed Type" = 
    Table.TransformColumnTypes(
        #"Converted to Table",
        {{"Column1", Int64.Type}}
        )
in
    #"Changed Type"

It takes some time to load this table  – around twenty minutes – but because there are only four distinct values in the table the resulting .pbix file is only 31KB thanks to the way Power BI compresses data.

If you load this table into your dataset, call it VeryBigTable and create the following measure:

Median Test = MEDIAN(VeryBigTable[Column1])

…and use the measure in a visual, you’ll see the following error:

image

The current query cannot be evaluated for the ‘VeryBigTable (42)’ table, because the table contains more than two billion rows.

What’s more, the error will always occur even if you apply a filter to the table that returns less than two billion rows. The same problem occurs with some other functions, such as Percentile(), but it’s worth pointing out that the vast majority of DAX functions work as normal with tables with more than two billion rows – for example, in the pbix file used here the Sum() and CountRows() functions not only work fine but return instantly.

Luckily, in the case of the Median() function, there is an easy workaround because you can calculate a median in other ways such as the one described on the DAX Patterns site here. The code is a lot more verbose but it works on a 2 billion+ row table.

image

SSAS Tabular 2019, Calculation Groups And Migration From SSAS Multidimensional

With the release of CTP 2.3 of SQL Server 2019 today there was big news for Analysis Services Tabular developers: Calculation Groups. You can read all about them in detail in this blog post:

https://blogs.msdn.microsoft.com/analysisservices/2019/03/01/whats-new-for-sql-server-2019-analysis-services-ctp-2-3/

In my opinion this is the most important new feature in DAX since… well, forever. It allows you to create a new type of calculation – which in most cases will be a time intelligence like a year-to-date or a previous period growth – that can be applied to multiple measures; basically the same thing that we have been doing in SSAS Multidimensional for years with the time utility/shell/date tool dimension technique. It’s certainly going to solve a lot of problems for a lot of SSAS Tabular implementations, many of which have hundreds or even thousands of measures for every combination of base measure and calculation type needed.

I’m not going to repeat any of the detailed technical information in the blog post here, though. Instead the point I want to make is that this is very big news for SSAS Multidimensional users too. In the past couple of years many people who have existing SSAS Multidimensional implementations have thought about migrating to SSAS Tabular so they can take advantage of its new features or move to the cloud, and indeed many of them have already migrated successfully. However, up to now, the biggest blocker for those wanting to migrate from Multidimensional to Tabular has been the fact that some complex calculations that can be expressed in MDX cannot be recreated (or recreated easily and efficiently) in DAX, because DAX has not had an equivalent of calculated members not on the Measures dimension or the MDX SCOPE statement.

Calculation groups do not remove this roadblock completely, but they do remove the roadblock for a large group of existing SSAS Multidimensional users whose only complex calculation requirement is a time utility/shell/date tool dimension. As a result these SSAS Multidimensional users will now be able to migrate to SSAS Tabular 2019, Azure Analysis Services or Power BI if they want to. Only those people who have more exotic uses for calculated members not on the Measures dimension (which are not very common at all) and those who use SCOPE statements (a larger group – many people working with financial data use SCOPE statements heavily) will find that Multidimensional is still the correct platform for them.

How The New IsAvailableInMDX Property For Analysis Services Tabular Can Reduce Memory Usage And Speed Up Processing

Following on from my post the other week about the new RowsetSerializationLimit server property, I thought it would be a good idea to write about why the new IsAvailableInMDX property (announced in the same blog post) is so important. In fact, I would say that everyone using Analysis Services Tabular 2017 (CU7 or higher) or Azure Analysis Services should spend some time investigating it because the potential benefits in terms of reduced memory usage and faster processing times are significant, especially for larger models.

First of all, what does it actually do? As the blog post says, it allows you to stop attribute hierarchies from being built on columns when you don’t need them. But what are attribute hierarchies? They are structures that are used only when you are querying your Tabular model using MDX; Excel PivotTables, for example, generate MDX queries when they are connected to Analysis Services Tabular whereas Power BI always generates DAX queries. An attribute hierarchy allows a column on a table to be used on the rows or columns axis of an MDX query, and in Excel that means you will be able to drag that field onto the rows or columns area of a PivotTable. Attribute hierarchies are used by some DAX functionality too – for example the TreatAs() function (at least for now) needs them to be present to work. Frustratingly, the DAX functionality that does need attribute hierarchies is not documented.

To give you an example, consider a Tabular model that contains a table with three columns, Product, Customer and Sales, and a measure that sums up the values in the Sales column.

image

 

I can query this Tabular model in Power BI, for example by creating a Matrix visualisation:

image

I can also get the same values out using an Excel PivotTable:

image

Now the reason I can create this PivotTable is that Analysis Services Tabular has created attribute hierarchies on the Customer and Product columns. However, the important thing to understand is that Analysis Services Tabular creates attribute hierarchies on every column on every table by default, including the Sales column. This allows me to create a PivotTable like this, with the distinct values from Sales on the rows of the PivotTable:

image

image

You’re probably thinking, why would I ever want to use Sales – a measure column – like this? And the answer is you probably wouldn’t, even though Tabular allows this by default. What’s more, building the attribute hierarchy for Sales makes processing slower and the resulting hierarchy uses memory, so all this comes as a cost. The IsAvailableInMDX property is therefore very useful because it allows you to stop attribute hierarchies from being built on columns like Sales where they serve no real purpose.

Unfortunately at the time of writing SSDT doesn’t allow you to set the IsAvailableInMDX property but the good news is that the latest versions of Tabular Editor do:

image

Setting IsAvailableInMDX to false for the Sales field has no impact at all in Power BI, so long as you are not using functionality like TreatAs() that needs it. In Excel, it just means that it is no longer possible to drag Sales onto rows or columns in a PivotTable – the Sales Amount measure still works:

image

As a result, there are two recommendations that can be made:

  • If you are not using any client tools that generate MDX queries (such as Excel) or you want to prevent your users from using them, and you can be sure that it does not affect any of your existing Power BI reports or DAX calculations, you can set IsAvailableInMDX to false on every column of every table
  • If you are using client tools that generate MDX you can still probably set IsAvailableInMDX to false on every measure column and not lose any important functionality

How much of an impact will doing this have on processing times and memory usage? It depends, but it could be a lot. The anecdotal evidence on Twitter is promising:

image

image

I did my own (not particularly scientific) test using a table with five million rows and ten columns, each of which contained the integers between one and five million. Here’s the M query to generate such a table without the need for an external data source:

let
    Source =
	Table.FromColumns(
		List.Repeat(
			{{1..5000000}},
			10
		)
	),
    #"Changed Type" =
	Table.TransformColumnTypes(
		Source,
		List.Transform(
			Table.ColumnNames(Source),
			each {_, Int64.Type}
		)
	)
in
    #"Changed Type"

On my laptop, with IsAvailableInMDX set to true for all ten columns, a full process on this table took around 105 seconds and the table size reported by Vertipaq Analyzer was 381MB. After changing IsAvailableInMDX to false for all ten columns, the time for a full process went down to around 81 seconds and the table size was down to 191MB.

In summary, this is one of those seemingly obscure technical changes that turns out to be way more useful than you might think. If you test out setting IsAvailableInMDX on your Tabular model, please leave a comment letting me know what kind of impact it had!

[Thanks to Daniel Otykier for providing a lot of information for this post]

A New Approach To Handling SSRS Multi-Valued Parameters in DAX Queries

Back in 2012 I wrote a blog post explaining how to handle multi-value parameters in DAX queries in Reporting Services reports. The approach I took back then was basically:

  1. Generate a string containing a pipe-delimited list of all the parameter values that the user has selected (I did this in DAX too, but to be honest it’s better to use the SSRS Join function to do this outside the DAX query)
  2. Use the DAX PathContains() function in a Filter() to check whether the value on the current row of the table being filtered appears in the pipe-delimited list

Here’s a deliberately simplified example of how this works based on Adventure Works DW data. The following query filters the FactInternetSales fact table and returns the rows for the Sales Order Numbers that are present in the OrderList variable:

EVALUATE
VAR OrderList = "SO43713|SO43758|SO43784|SO43821"
RETURN
    FILTER (
        FactInternetSales,
        PATHCONTAINS(OrderList, FactInternetSales[SalesOrderNumber])
    )

The trouble with this approach is that is that it can be very slow. Running a trace in DAX Studio for the query above reveals the problem:

image

The presence of CallbackDataID shows that the Storage Engine is calling the Formula Engine to handle the use of PathContains() in the filter, and this is often a cause of poor query performance. However back when I wrote the post the only alternative was, as Chris Koester points out here, to dynamically generate the entire DAX query as an SSRS expression and that is very painful to do.

The good news is that recent changes in DAX mean that there is another way to tackle this problem that can give much better performance. Here’s an example of this new approach:

EVALUATE
VAR OrderList = "SO43713|SO43758|SO43784|SO43821"
VAR OrderCount =
    PATHLENGTH ( OrderList )
VAR NumberTable =
    GENERATESERIES ( 1, OrderCount, 1 )
VAR OrderTable =
    GENERATE (
        NumberTable,
        VAR CurrentKey = [Value]
        RETURN
            ROW ( "Key", PATHITEM ( OrderList, CurrentKey ) )
    )
VAR GetKeyColumn =
    SELECTCOLUMNS ( OrderTable, "Key", [Key] )
VAR FilterTable =
    TREATAS ( GetKeyColumn, FactInternetSales[SalesOrderNumber] )
RETURN
    CALCULATETABLE ( FactInternetSales, FilterTable )

Broken down variable by variable, here’s how it works:

  1. OrderList is the pipe-delimited list of key values passed from SSRS
  2. OrderCount uses the PathLength() DAX function to find the number of parameter values in this list
  3. NumberTable uses the GenerateSeries() function to create a table of numbers with one row for each number between 1 and the number of parameter values in the list
  4. OrderTable uses the trick Marco describes here to iterate over NumberTable and, for each row, uses the PathItem() function to return one parameter value from the list for each row in the able
  5. GetKeyColumn uses the SelectColumns() DAX function to only return the column from OrderTable that contains the parameter values
  6. FilterTable uses the TreatAs() DAX function to take the table of values returned by GetKeyColumn and treat them as values in the FactInternetSales[SalesOrderNumber] column
  7. Finally, the query returns the contents of the FactInternetSales table filtered by the values in FilterTable using the CalculateTable() DAX function

There’s a lot of extra code here and in some cases you may find that performance with smaller data volumes is worse as a result, but in this particular case the new approach is twice as fast at the old one. There’s certainly no CallBackDataID:

image

Deprecated And Discontinued Functionality in SSAS 2017

In the past I’ve blogged about deprecated and discontinued functionality in SSAS 2014 and SSAS 2016; I forgot to check what’s deprecated and discontinued in SSAS 2017 until last week but it turns out that there are a few things that are worth knowing.

Here’s the link to the official documentation:

https://docs.microsoft.com/en-us/sql/analysis-services/analysis-services-backward-compatibility-sql2017?view=sql-analysis-services-2017

…and here are the definitions of ‘deprecated’ and ‘discontinued’:

A deprecated feature will be discontinued from the product in a future release, but is still supported and included in the current release to maintain backward compatibility. It’s recommended you discontinue using deprecated features in new and existing projects to maintain compatibility with future releases.

A discontinued feature was deprecated in an earlier release. It may continue to be included in the current release, but is no longer supported. Discontinued features may be removed entirely in a future release or update.

As far as discontinued features go it’s straightforward: everything that was deprecated in SSAS 2016 is now discontinued. For SSAS MD that means remote partitions, remote linked measure groups, dimension writeback and linked dimensions are now discontinued; I don’t think these features were ever used by more than a small number of people. Profiler is discontinued too and that’s more of a problem, given that the UI for Extended Events in SSMS remains awful and unusable for the kind of query performance tuning tasks I use Profiler for (I blogged about this issue here). The state of tooling for SSAS is already pretty bad and if Profiler stops working in the future the situation will be even worse; is it right that we have to rely on community-developed tools like DAX Studio and Analysis Services Query Analyzer, however good they are, for tasks like performance tuning?

UPDATE 30th April 2018: it turns out that Profiler was put on the ‘discontinued’ list by accident, and in fact is still only deprecated. The documentation has now been updated appropriately.

There are two important deprecated features:

  • SSAS Multidimensional data mining. Given that it has not had any new features now for a long, long time (even longer than the rest of SSAS MD) and was never very popular in the first place, I’m not surprised. However the example of Microsoft’s first, failed attempt at brining data mining to a wider audience is interesting in the light of the company’s attempts to do the same thing with Azure Machine Learning and other services. As far as I understand it the technology was never the problem and it was about as easy to use as it could be, so why did it fail? I’m not the right person to answer this question but I suspect the reasons include the following: Microsoft BI customers were not ready for data mining back when it was first launched; customers who did want data mining didn’t want to buy a product from Microsoft; very few Microsoft partners had the skills or experience to sell it; and finally is it even possible to do proper data science in a user-friendly GUI with no coding?
  • SSAS Tabular models at the 1100 and 1103 compatibility level (for SSAS 2012 and SSAS 2012 SP1). Anyone that is still running Tabular models at this compatibility level really needs to upgrade, because they’re missing out on the great new features that have appeared in SSAS 2016 and 2017.
%d bloggers like this: