Using the Resource Usage Profiler Event For SSAS Multidimensional Query Performance Tuning, Part 1: Rows Scanned And Rows Returned

If you’re performance tuning SSAS Multidimensional Storage Engine issues, the Resource Usage Profiler event can provide a lot of useful information about what’s going on behind the scenes when you run a query. This is something I have blogged about in the past (and it will be useful to read this post before carrying on) but recently I’ve done some more research into this area and found out a lot more things about what this event tells you.

For my testing I created a very simple cube from a single fact table. The table contained 5000 rows and two columns: a dimension key column containing the values 1 to 5000, and a measure column that always contained the value 1. From this I built a single measure group with a single measure called My Measure, and a single dimension built from the dimension key column with 5000 members on it called ID.

Consider the following MDX query:

SELECT 
{[Measures].[My Measure]}
ON 0
FROM
[MyCube]
WHERE([ID].[ID].&[1])

Here’s the output:

image

In this query the WHERE clause filters the output to one member from the ID dimension, which in turn returns data from one row in the underlying fact table, and so My Measure returns 1. If the query is run on cold cache, the Resource Usage Profiler event returns the following:

image

The ROWS_RETURNED value here returns 1, which is what you might expect – the query results show data from one row in the underlying fact table. The ROWS_SCANNED value is 256 though. Why? Chapter 20 of the book “Microsoft SQL Server 2008 Analysis Services Unleashed” has a lot of detail about how SSAS MD stores data on disk, but the important point here is that the data in a partition is stored on disk in segments, with each segment made up of pages, and when a query is run SSAS will scan all the pages that it thinks contain data. The ever-reliable Akshai Mirchandani of the SSAS dev team helped me with the remaining information I needed:

  • There are 65536 rows of data per segment
  • There are 256 pages per segment
  • There are therefore, at most, 256 rows per page

So in this case ROWS_SCANNED shows 256 because one complete page was scanned.

Modifying the query to slice on two members from ID like so:

SELECT 
{[Measures].[My Measure]}
ON 0
FROM
[MyCube]
WHERE([ID].[ID].&[1]:[ID].[ID].&[2])

image

…results in a ROWS_RETURNED of 2 and a ROWS_SCANNED that is still 256, because the two rows must be stored in the same page:

image

…while asking for 257 members from ID in the WHERE clause like so:

SELECT 
{[Measures].[My Measure]}
ON 0
FROM
[MyCube]
WHERE([ID].[ID].&[1]:[ID].[ID].&[257])

Results in a ROWS_RETURNED of 257 and a ROWS_SCANNED of 257 – obviously 2 pages are now being scanned to get the data needed for the query.

image

Finally, the query:

SELECT 
{[Measures].[My Measure]}
ON 0
FROM
[MyCube]
WHERE([ID].[ID].&[5000])

…returns a rows scanned of 136:

image

This must be because the final page, which doesn’t contain the full 256 rows, is scanned. 5000-136=4864, and 4864/256=19, so there must be 20 pages of data: 19 pages of 256 rows and one final page of 136 rows.

I don’t think it’s worth getting too hung up on the exact values that ROWS_SCANNED and ROWS_RETURNED, especially given that they return totals for all Storage Engine activity for all measure groups for the whole query, but knowing that they tell you roughly how much work is being done my the Storage Engine means that you can use them to watch for warning signs that something isn’t working properly when you’re performance tuning queries. In subsequent parts of this series I’ll show some practical examples of this.

Making Sure All Columns Appear When You Combine Data From Multiple Files In Power BI/Power Query M

Here’s a really common problem that occurs when combining data from multiple files, or indeed any type of data source, in Power BI/Power Query/Excel Get&Transform. Imagine you have a folder with two Excel files in, and each Excel file contains a table called SalesTable:

image

image

image

You use the “From Folder” data source to combine all the data from all the Excel files in this folder, you get a table like this:

image

…and you’re happy. Then, at some later date a third file is added to the folder that has an extra column in its SalesTable table called Comments:

image

You refresh your query, though, and you don’t see the Comments column anywhere:

image

Why not? If you look at the query that has been generated and go back to the “Removed Other Columns1” step you’ll see a table containing a column containing table values:

image

…and you’ll also see that the next step in the query, “Expanded Table Column1”, uses the Table.ExpandTableColumn function – the M function that gets called if you click the Expand/Aggregate button in the column header highlighted in the previous screenshot – to flatten these nested tables out. And the problem is that Table.ExpandTableColumn needs to know in advance the names of the columns you want to expand.

Now this is an extremely common, and powerful, Power Query/M pattern. Apart from the “From Folder” functionality for automatically combining data from multiple files it’s something I find myself building manually all the time: write a function, for example to make a single call to a web service; create a table containing one row for each call to the web service that I want to make, use the Invoke Custom Function button to call my function for each row, and then expand to get all the data from all the function calls. And the more I use this pattern, the more I run into situations where I don’t see columns I’m expecting to see because I’ve done an Expand in an earlier step that has a hard-coded list of column names in it (it’s a very similar problem to the one that Ken Puls blogged about here). It’s a pain to have to keep changing this list, and the real problem comes when you don’t actually know in advance what the names of the columns to expand are.

One solution would be to do something similar to what I show in this post: iterate through all the tables in the table column, find a distinct list of column names, and then use this list with Table.ExpandTableColumn. However, there is an easier way to handle this: use Table.Combine instead of Table.ExpandTableColumn. The great thing about Table.Combine is that it will always return all of the columns from all of the tables it’s combining.

Here’s a function that shows how it can be used:

(TableColumn as list, optional SourceNameColumn as list) =>
let
  AddIDs = 
    if 
    SourceNameColumn=null 
    then 
      TableColumn 
    else 
      let
        ZipNames = 
          List.Zip({TableColumn, SourceNameColumn}),
        AddColumnFunction = 
          (ListIn as list) => 
          Table.AddColumn(ListIn{0}, "Source", each ListIn{1}),
        AddColumns = 
          List.Transform(ZipNames, each AddColumnFunction(_))
      in
        AddColumns,
  Combine = Table.Combine(AddIDs)
in
  Combine

This function takes a list of tables and, optionally, a list of text values that contain a name for each table (this optional parameter accounts for the majority of the code – without it all you would need is the Combine step). If you paste this code into a new query called, say, CombineTables, you can either call it by adding some M code to an existing query or more easily just call it direct from the UI. In the latter case when you click on the function query in the Query Editor window you’ll see this:

image

Assuming you already have a query like the one shown above that contains a column with table values in it and another column containing the original Excel file names, you need to click the Choose Column button for the TableColumn parameter and select the column that contains the table values in the dialog that appears:

image

…and then do the same thing for the SourceNameColumn parameter:

image

…and then click the Invoke button back in the Query Editor, and you’ll get a table containing all of the data from the SalesTable table in each workbook, including the Comments column from the third Excel workbook:

image

With no hard-coded column names you’ll now always get all of the data from all of the columns in the tables you’re trying to combine.