Disabling Excel PivotTable Grouping And Session Cubes In SSAS Multidimensional

The Excel PivotTable grouping functionality that is available when you are connected to an SSAS Multidimensional cube (but not a Tabular model) is a Very Bad Thing indeed. In my experience it is a major cause of query performance problems – not just for the person running the query, but because it is so resource intensive for everyone else trying to query the cube too. This post from a few years ago gives some more details about why custom grouping, and the CREATE SESSION CUBE MDX statements that it uses behind the scenes, are so expensive.

Up until now it was impossible to prevent users from using custom grouping but in a recent cumulative update this changed. First of all, I encourage you to read the details of the fix because you will probably want to install the relevant CU for security reasons anyway:

https://support.microsoft.com/en-us/kb/3080856

[And if you are wondering whether you should be installing CUs I strongly suggest you read this post by Aaron Bertrand]

Once the CU has been installed the following new property is added to the msmdrv.ini file:

\OLAP\Query\SessionCubesMode

This will be set to 1. Setting this property to 0 will prevent users from creating session cubes and therefore prevent them from using Excel’s custom grouping functionality.

My opinion is that it’s a good idea to disable session cubes and custom grouping even if you don’t know whether your users are using these features. Yes, your users will lose some functionality and some reports might even break, but you will also save yourself and your users a lot of problems. If your users need to do custom grouping in a report that is usually an indication that you have missed something in your dimension design, and that an extra attribute hierarchy or two is necessary.

[Thanks to Akshai for this information]

New Ways To Concatenate Text In Excel 2016 With CONCAT() And TEXTJOIN()

I don’t normally blog about Excel topics outside of Power Query and Power Pivot, but I think anyone who has ever done any serious work in Excel has found how difficult it is to concatenate more than two text values (although there are plenty of blog posts like this one showing how to do it). Well, not any more – Microsoft have finally got around to addressing this problem with two new Excel functions, TEXTJOIN() and CONCAT(). The announcement, plus news of other new Excel functions and a new funnel chart type, is here:

https://support.office.com/en-us/article/What-s-new-in-Excel-2016-for-Windows-5fdb9208-ff33-45b6-9e08-1f5cdb3a6c73

First, the bad news: at the time of writing you’ll only be able to use these functions in Excel on the desktop if you’re using the click-to-run version of Excel 2016 that you get through an Office 365 subscription (ie the version installed through the Office 365 portal that gets updated by Microsoft automatically – probably not the same version that you’re running on your desktop if you work for a big company). It is available in Excel Online and Excel Mobile too. I guess they’ll appear in the regular, Windows-installer version of Excel 2016 in a service pack at some point in the future.

How about some examples? First of all, CONCAT() is a successor function for the old CONCATENATE() function – as far as I can see it does everything CONCATENATE() does but crucially also supports ranges as inputs as well as individual text values. So

=CONCAT("The ", "cat ", "sat ", "on ", "the ", "mat.")

Returns

The cat sat on the mat.

image

…but also, if you have each of your words in different cells (nb I’ve added spaces at the end of each word here), you can concatenate all the values in a range like this:

=CONCAT(B2:B7)

image

The TEXTJOIN() function is more flexible and I suspect will be very popular. It gives you two benefits over CONCAT():

  1. The ability to specify a delimiter – a character or characters (for example a comma or a space) to insert between each item of text you want to concatenate
  2. The ability to ignore empty values

Some examples…

First, using a space in the first parameter:

=TEXTJOIN(" ",FALSE,"The","cat","sat","on","the","mat.")

Returns once again

The cat sat on the mat.

…without having to add spaces to the end of each word, as I did with the CONCAT() example above:

image

And

=TEXTJOIN(",",FALSE,"The","cat","sat","on","the","mat.")

Returns

The,cat,sat,on,the,mat.

image

Finally, here’s an example of how TEXTJOIN() can be used with a range with the second parameter being set to TRUE to ignore empty cells in a range:

=TEXTJOIN(",", TRUE,B2:B9)

With the data in the screenshot below again returns

The,cat,sat,on,the,mat.

image

You can see an Excel Online worksheet with all these examples here.

Analysing Power BI DMV Queries In Power BI Desktop

Experienced Analysis Services and Power Pivot developers will know all the interesting things you can learn from DMV queries. For example, in this blog post Kasper de Jonge shows how to use a DMV to find out which columns are using the most memory in a Power Pivot model; here’s an older post from me on using DMVs to show the dependencies between DAX calculations. There’s plenty of other interesting stuff about how data is stored and so on that you can find with a bit of digging, and there’s some reasonably up-to-date documentation on DMVs for SSAS here.

However, running DMV queries against a Power BI Desktop model (which of course runs a local version of the same engine that powers Analysis Services Tabular and Power Pivot) and more importantly doing something useful with the information they return, isn’t straightforward. You can run DMV queries from DAX Studio but that will only give you the table of data returned; you need to copy and paste that data out to another tool to be able to analyse this data. Instead it’s possible to use Power BI Desktop’s own functionality for connecting to Analysis Services to connect to its own local data model and run DMV queries.

If you’re connecting to an instance of Analysis Services in Power BI Desktop you need a server name and a database name and the same goes when you’re connecting Power BI Desktop to itself. This episode of Adam Saxton’s excellent Guy In A Cube YouTube show details how to connect SQL Server Profiler to Power BI Desktop in order to run a trace, and the method he uses to find the connection details also works for our purpose here; it’s definitely worth a watch because it goes into a lot of detail. However it’s much easier to get the connection details from DAX Studio using the following steps (thanks to Darren Gosbell for pointing this out):

  1. Open Power BI Desktop and load data as normal.
  2. Open DAX Studio and choose the Power BI Designer data source option:

    image

  3. Look in the bottom right-hand corner of the screen and you’ll see the server name, including the all-important port number, that you have connected to. In this case it’s
    localhost:52981

    image

  4. Run the following DMV query in a DAX query window. This will give you the nasty GUID that is the name of the only database in the Power BI data model:
    SELECT [CATALOG_NAME] FROM $SYSTEM.DBSCHEMA_CATALOGS
    

    image

  5. Now go back to Power BI Desktop, click the Get Data button and choose Analysis Services. In the connection dialog enter just the server name and port number and the database name found in the previous steps, as well as your DMV query:

    image

Once you have done this, you can load the results of your DMV query into the data model and analyse the results just like any other data. Here’s an M query using the DMV from the blog post by Kasper referenced earlier that looks at the memory used by just one table in the model:

let
    Source = AnalysisServices.Database(
     "localhost:52981", 
     "1107a5df-3dd9-4c16-80b6-cf571c4e794f", 
     [Query="SELECT dimension_name, attribute_name, DataType,
                   (dictionary_size/1024) AS dictionary_size 
                   FROM $system.DISCOVER_STORAGE_TABLE_COLUMNS"]),
    #"Filtered Rows" = Table.SelectRows(Source, 
     each ([dimension_name] = "FactInternetSales"))
in
    #"Filtered Rows"

And here’s a bar chart built from that query showing clearly that the SalesOrderNumber column in the FactInternetSales table is very expensive:

image

Before you get too excited about this, there are two major problems you need to be aware of when using this technique:

  1. The port number used to connect to the local data model and the database name will change every time you open the Power BI Desktop file, so you will have to edit the connection information manually if you want to refresh the data after reopening.
  2. You won’t be able to make this connection refresh once you have published the file to PowerBI.com – so this will only work on the desktop.

That said, I think this is still very useful for development purposes. At some point I expect we’ll probably get a range of management reports in PowerBI.com that show similar data on published reports – just like we used to have in the old Office 365 Power BI.

Monitoring SSAS Multidimensional MDX Query Results Serialisation, Part 2

In part 1 of this series I looked at the basics of monitoring SSAS Multidimensional query resultset serialisation in Profiler. In this post, I’ll be taking a look at what happens for queries that return large amounts of data

Consider the following query on the Adventure Works DW database, which, when I run it in SQL Server Management Studio returns a cellset with 60391 rows:

SELECT
{[Measures].[Internet Order Quantity], 
[Measures].[Internet Sales Count]}
ON 0,
NON EMPTY
[Date].[Date].[Date].MEMBERS
*
[Customer].[Customer].[Customer].MEMBERS
*
[Product].[Product].[Product].MEMBERS
ON 1
FROM
[Adventure Works]

image

There are a couple of interesting things to note about this query. First, SQL Server Management Studio on my laptop says that it takes nine seconds to run, even on a warm cache; the Duration column for the Query End event in Profiler, however, shows a value of around six seconds. The three second difference must be the time it takes for SSAS to return the cellset to SQL Server Management Studio, and for SQL Server Management Studio to render the results (my guess is that it’s the second operation that takes the majority of this time – other client tools may be more efficient at rendering large resultsets).

Secondly, in Profiler, you’ll see a much larger number of Serialize Results Current events. In situations where an axis contains more than a thousand tuples, or a cellset contains more than a thousand cells, you’ll see one Serialize Results Current event for each thousand tuples or cells. The ProgressTotal column will show values incrementing by one thousand up to the total number of tuples or cells. So, here’s some of what Profiler shows for the serialisation of the Rows axis:

image

…and here’s the end of the trace, showing the end of the serialisation of the cells (60391 rows * 2 columns = 120782 cells):

image

The third thing to notice is that there is only one Storage Engine operation – shown by the Query Subcube Verbose event in the first Profiler screenshot above – and that hits the Storage Engine cache and is so quick the Duration column shows 0 ms. Getting the raw data isn’t the problem here, and there aren’t any MDX calculations either – which means that it’s the Non Empty filter and construction of the cellset that is taking all the time. Since neither of these operations can be cached (although you can play tricks like this), this explains why the query always takes six seconds to run, even on a warm cache. Further investigation reveals that the Non Empty filter in fact only takes about a quarter of a second, so it’s the construction of a large cellset that’s the real problem here. This is why I say you should always avoid queries that return large amounts of data! SSAS is not very good at returning large resultsets.

Incidentally, don’t fall into the trap of thinking that the values shown in the Duration column for the Serialize Results End event only represents the amount of time taken to construct the cellset. It shows the amount of time since the Serialize Results Begin event, and in between the Begin and End events all kinds of other things necessary for the query to return (such as the evaluation of MDX calculations) could be going on. In a lot of cases the Serialize Results End event shows a duration that is almost the same as the duration for the whole query, but that only means that serialisation was able to start soon after the query began. In order to find the overhead of serialisation you need to work out how long all these other things take and subtract that from the duration shown for Serialize Results End, and that’s easier said than done.

Finally, what can you do to improve performance? Well, in the first post in this series I showed there was a tabular alternative to a cellset, and this is certainly a lot more efficient at returning large amounts of data (although you probably won’t have a choice in this unless you are building your own client tool, and, SSRS uses the tabular format anyway). For this query a tabular resultset is almost two seconds faster to return than a cellset, at just over four seconds:

image

There’s another important technique you can use, once that I have already mentioned in a blog post a couple of years ago but which is worth mentioning again: each cell returned by this query returns a large number of properties that you may not need, and these extra properties have a significant effect on the size of the resultset. Adding a CELL PROPERTIES clause to the query so that you only return the value property, like so:

SELECT
{[Measures].[Internet Order Quantity], 
[Measures].[Internet Sales Count]}
ON 0,
NON EMPTY
[Date].[Date].[Date].MEMBERS
*
[Customer].[Customer].[Customer].MEMBERS
*
[Product].[Product].[Product].MEMBERS
ON 1
FROM
[Adventure Works]
CELL PROPERTIES VALUE

…takes another two seconds off the duration of the query, whether you use a tabular resultset or a cellset:

image

Monitoring SSAS Multidimensional MDX Query Results Serialisation, Part 1

Profiler (or indeed XEvents) can tell you a lot of interesting things about what happens when SSAS Multidimensional serialises the resultset returned by an MDX query. To be honest, this isn’t something I’ve looked at in detail before but recently I decided to do some research in this area – it turns out that monitoring the Profiler events related to serialisation can be very useful when you’re trying to understand what a Profiler trace is telling you about query execution as a whole.

To start off, let’s look at some very simple examples. Consider the following MDX query on the Adventure Works DW database:

SELECT
{[Measures].[Internet Order Quantity],
[Measures].[Internet Sales Count]}
ON 0,
[Date].[Calendar Year].[Calendar Year].MEMBERS
ON 1
FROM
[Adventure Works]

If you run it in SQL Server Management Studio in an MDX query window, you’ll get the following cellset back:

image

Running a Profiler trace using, amongst others, the Serialize Results Begin, Serialize Results Current and Serialize Results End events shows how SSAS is constructing the cellset returned:

image

The Serialize Results Begin event marks the point where SSAS starts to construct the cellset returned. The Serialize Results Current events that immediately follow it, with EventSubclass “1 – Serialize Axes” show SSAS serialising the tuples that are present on the Columns axis (listed as Axis 0 in the TextData column), the Rows axis (Axis 1) and the Where clause (Slicer Axis). The numeric values in the ProgressTotal column for the Serialize Results Current events shows the number of tuples on each axis: the two tuples on columns are the two measures, the six tuples on rows are the six years, and there’s one tuple on the slicer. After that SSAS gets the data for each of the cell values (as shown by the Query Subcube Verbose event – note that this query is running on a warm cache) and there is then a Serialize Results Current event with EventSubclass “2 – Serialize Cells”; the ProgressTotal column shows that twelve cells (2 columns * 6 rows) in total were returned.  The Serialize Results End event shows that SSAS has finished constructing the cellset and not surprisingly it’s followed immediately by the Query Cube End and Query End events.

Sometimes SSAS needs to do a bit more work to find out what tuples are on an axis before serialisation can begin. The following query adds a NON EMPTY to the rows axis of the query above, so that only the rows that have values are returned:

SELECT
{[Measures].[Internet Order Quantity],
[Measures].[Internet Sales Count]}
ON 0,
NON EMPTY
[Date].[Calendar Year].[Calendar Year].MEMBERS
ON 1
FROM
[Adventure Works]

image

In order to know which years will appear on rows it has to query the cube to find out which ones have values for the measures on columns; as a result the Profiler trace shows a Query Subcube Verbose event appearing now before Serialize Results Begin:

image

This trace deliberately doesn’t show any of events related to the NON EMPTY operation (something I’ll be writing about in future blog posts), but it looks like SSAS can work out which years have values and get the values needed for the cellset back in a single operation so there’s no need for another Query Subcube Verbose event before the Serialize Cells event. Note that now the empty rows have been excluded, the ProgressTotal shows that only four tuples are returned on rows and eight cells are returned overall.

Cellsets are not the only type of resultset that SSAS can return though. Some client tools, most notably Reporting Services, return a tabular dataset that doesn’t have the concept of axes; running the previous query in SSRS gives the following in Profiler:

image

As you can there is now only one Serialize Results Current event and it has the EventSubclass “3 – Serialize SQL Rowset” (even though this is still an MDX query); the ProgressTotal column shows four, which is the number of rows returned.

In Part 2, I’ll look at what happens when a query returns a much larger amount of data.

The Table.Schema() Function In Power BI/M

Yet another new M function for you this week: the Table.Schema() function, which returns information about the columns in a table. There’s some fairly detailed documentation about what it returns here; a simple demo is always a lot more helpful though, I think.

If you connect to the Adventure Works DW database in SQL Server and import the DimDate table, you’ll get an M query that returns the contents of that table (along with some extra columns that describe the relationships between that table and the others in the database):

let
    Source = Sql.Database("localhost", "adventure works dw"),
    dbo_DimDate = Source{[Schema="dbo",Item="DimDate"]}[Data]
in
    dbo_DimDate

image

If you add an extra step to this query that calls the Table.Schema() function on the table returned by the dbo_DimDate step, like so:

let
    Source = Sql.Database("localhost", "adventure works dw"),
    dbo_DimDate = Source{[Schema="dbo",Item="DimDate"]}[Data],
    GetSchema = Table.Schema(dbo_DimDate)
in
    GetSchema 

…you get a table with one row for each column in the table returned by dbo_DimDate, and a lot of columns that give you information on each column such as its position in the table, its M data type, its data type in the original data source and so on:

image 

I can think of a lot of uses for this. Documenting a database is an obvious one; it would also serve as a much richer source of data when checking for changes in the structure of a data source, as I described here. Also, given how easy it is to tell whether two tables contain the same data in M, you could use this function to compare whether two tables have the same columns like so:

let
    Source = Sql.Database("localhost", "adventure works dw"),
    dbo_DimDate = Source{[Schema="dbo",Item="DimDate"]}[Data],
    SomeOtherTable = 
     Source{[Schema="dbo",Item="SomeOtherTable"]}[Data],
    TablesEquivalent = 
     (Table.Schema(dbo_DimDate)=Table.Schema(SomeOtherTable ))
in
    TablesEquivalent

If you want more detail when doing comparisons you can do that with a little bit more M code, but that’s probably a topic for another post..

Descriptive Statistics In Power BI/M With Table.Profile()

As Buck Woody notes here, when you are exploring a new data set it can be useful calculate some basic descriptive statistics. One new M function that appeared in Power BI recently can help you to do this: Table.Profile(). This function takes a value of type table and returns a table that displays, for each column in the original table, the minimum, maximum, average, standard deviation, count of values, count of null values and count of distinct values (but no mode or median?). So, given the following table:

…the Table.Profile() function returns the following table:

Of course you could create something similar yourself fairly easily (as I have done for a customer in the past), and it’s not as sophisticated as the Quick Insights feature, but it’s handy to have a single function that does all this.

You could even use it on all of the tables in a SQL Server database. Since the Sql.Database() function returns a table containing all of the tables and views in a database, like so:

image

All you need to do to use Table.Profile() on all these tables is to add a new custom column that calls this function for every value in the Data column:

image

Then finally expand the new custom column and you’ll see the stats for every column in every table:

image

Here’s the code:

let
    Source = Sql.Database("localhost", "adventure works dw"),
    #"Added Custom" = Table.AddColumn(Source, "Profile", 
      each Table.Profile([Data])),
    #"Expanded Profile" = Table.ExpandTableColumn(#"Added Custom", 
      "Profile", 
      {"Column", "Min", "Max", "Average", "StandardDeviation", "Count", "NullCount", "DistinctCount"}, 
      {"Column", "Min", "Max", "Average", "StandardDeviation", "Count", "NullCount", "DistinctCount"})
in
    #"Expanded Profile"