Reducing The Number Of Queries Generated By Excel Cube Function Reports On Power BI, Analysis Services And Power Pivot

I am a big fan of using Excel cube functions for reporting on Power BI datasets, Analysis Services and Power Pivot: they allow for a lot more layout flexibility than PivotTables when building reports in Excel. However, they do have a reputation for poor performance and part of the reason for this is their chattiness. While Excel does not generate one query for each cell containing a cube function, it is true that a report using cube functions will generate a lot more MDX queries against your Power BI dataset/Analysis Services cube/Power Pivot mode than the equivalent PivotTable. As a result, one way to improve the performance of reports that use Excel cube functions is to optimise them to reduce the number of MDX queries generated.

To understand how to do this you first need to understand how Excel generates the MDX queries needed by cube functions. First of all it looks at the cells containing CubeValue functions on a worksheet and groups them together by the granularity of the data they are requesting; then, for each granularity, it runs one or more MDX queries to get the data it needs, where each query gets data for up to 500 cells. There’s not much you can do to control this behaviour, but in situations where you have multiple fact tables with different granularities there is a trick you can play to reduce the number of queries.

Let’s take a simple example. Consider the following source data:

…loaded into a Power BI dataset with two fact tables, Sales and Targets, and two dimension tables, Product and Country:

Now consider the following report that uses two groups of cube formulas to get the Sales Amount for Apples in the UK and the Target Amount for the UK:

Here are the formulas for these cells:

This worksheet generates two MDX queries for the two different granularities (plus one other MDX query that gets some metadata). The first gets the Sales Amount for Apples in the UK and populates the CubeValue function in cell D3. This query consists of a single MDX tuple whose granularity is Country, Measure and Product:

SELECT 
{([Country].[Country].&[UK],[Measures].[Sales Amount],[Product].[Product].&[Apples])} 
ON 0 
FROM [Model] 
CELL PROPERTIES VALUE, FORMAT_STRING, LANGUAGE, BACK_COLOR, FORE_COLOR, FONT_FLAGS

The second gets the Target Amount for the UK and populates the CubeValue function in cell D6. It consists of a single MDX tuple whose granularity is Country and Measure:

SELECT 
{([Country].[Country].&[UK],[Measures].[Target Amount])} 
ON 0 
FROM [Model] 
CELL PROPERTIES VALUE, FORMAT_STRING, LANGUAGE, BACK_COLOR, FORE_COLOR, FONT_FLAGS

It is possible to get the same data in a single MDX query and the key to doing so is to make the granularity of the two requests the same. One way of doing this is to edit the contents of cell D6, which at this point contains the following formula to get the Target Amount (in D5) for the UK (in C6) using the CubeValue function:

=CUBEVALUE("CubeFunctionsOptimisationDataset", C6,D5)

If you add an extra reference to cell C3, which contains the CubeMember function returning the Product Apples, like so:

=CUBEVALUE("CubeFunctionsOptimisationDataset", C6,D5, C3)

Then this results in exactly the same data being returned to Excel and exactly the same data being displayed in the worksheet, but with a single MDX query being generated:

SELECT 
{([Country].[Country].&[UK],[Measures].[Target Amount],[Product].[Product].&[Apples]),
([Country].[Country].&[UK],[Measures].[Sales Amount],[Product].[Product].&[Apples])} 
ON 0 
FROM [Model] 
CELL PROPERTIES VALUE, FORMAT_STRING, LANGUAGE, BACK_COLOR, FORE_COLOR, FONT_FLAGS

As you can see, this query now consists of two tuples whose granularity is Country, Measure and Product. The reason this works is because adding the reference to the Product Apples makes no difference to the value returned by the Target Amount measure – which has no relationship with the Product dimension table – but it is enough to fool Excel into thinking that the CubeValue function in cell D6 is making a request at the same granularity as the CubeValue function in cell D3. It is necessary to add a reference to an individual Product, such as Apples, rather than the All Member on Product though.

Another, perhaps more complicated, way of achieving the same result is to leave the formula in cell D6 the same but change the formula in C6 from:

=CUBEMEMBER("CubeFunctionsOptimisationDataset", "[Country].[Country].[All].[UK]")

…to use the tuple form of CubeMember to get the combination of Apples and UK:

=CUBEMEMBER("CubeFunctionsOptimisationDataset", {"[Product].[Product].[All].[Apples]","[Country].[Country].[All].[UK]"})

Note that when you use this tuple form of CubeMember, putting Apples first in the tuple and Country second results in only the Country name being displayed in the cell, so again the data displayed in Excel is exactly the same.

Reducing the number of MDX queries in this way can improve performance for two reasons: it reduces the number of round trips to the dataset and it gives the Analysis Services engine (in Power BI, Analysis Services or Power Pivot) the chance to get the data needed in a more optimal way than might be possible with multiple queries. I don’t think the technique in this post will make a massive difference to performance but if you do try this, I’m curious to see how much of an improvement you see.

Filtering An Excel Cube Function Report By A List Of Manually-Entered Values

In Power BI there’s a popular custom visual called “Filter by list” that lets you filter a Power BI report by any list of values that you paste into it. It can save you a lot of time in some scenarios, for example if you need to copy a list of values from another application and select those values in a slicer. In this post I’ll show how to recreate the same functionality in an Excel report connected to Power BI, Analysis Services or the Excel Data Model/Power Pivot using cube functions and dynamic arrays.

To show how I’m going to use a super-simple model built using Power Pivot consisting of the following single table:

The only other thing to note about the model is that it contains a measure called Sales Amount that sums up the values in the Sales column:

Sales Amount:=SUM(Sales[Sales])

Here’s what a PivotTable connected to this model looks like:

The aim here is to recreate this PivotTable using cube functions and allow the user to enter the list of invoice numbers used to slice the data either manually or by copy-and-pasting them into a table.

The first step is to create an Excel table (which I’ve called InvoiceNumbers) to hold the invoice numbers the user enters:

The next thing to do is to generate the text of the MDX set expression representing the list of invoice numbers in this table, which I’ve put in cell D2:

="{" & TEXTJOIN(",",TRUE, "[Sales].[Invoice Number].[Invoice Number].&[" & InvoiceNumbers & "]" ) &"}"

This text is used to create two named sets using the CUBESET function. The first, which I’ve put in cell D3, simply returns the set of invoice numbers that you get from evaluating the above MDX expression:

=CUBESET("ThisWorkbookDataModel", $D$2, "Invoice Numbers")

The second named set, in D4, is more complicated: it returns the set of customers that have sales for these invoice numbers.

=CUBESET(
"ThisWorkbookDataModel", 
"NONEMPTY( [Sales].[Customer].[Customer].MEMBERS, {[Measures].[Sales Amount]} * " & $D$2 & ")",
"Customers")

Last of all are the cube functions that display the report itself. In cell E6 is the CUBEVALUE function returning the measure Sales Amount:

=CUBEMEMBER("ThisWorkbookDataModel", "[Measures].[Sales Amount]")

In D7 is the formula (using the technique I blogged about here) to get the list of Customers returned by the second named set above:

=MAKEARRAY(
CUBESETCOUNT($D$4), 
1, 
LAMBDA(r,c, CUBERANKEDMEMBER("ThisWorkbookDataModel", $D$4, r))
)

Finally, in D8, is the expression that gets the Sales Amount values for each customer, sliced also by the set of selected invoice numbers:

=MAKEARRAY(
CUBESETCOUNT($D$4),
1,
LAMBDA(r,c,
CUBEVALUE("ThisWorkbookDataModel", INDEX($D$7#,r), $D$3, $E$6))
)

Here are the formulas all together:

And here it all is working:

One last point: to keep things simple I’ve not included any error handling, which means that if a user enters a blank value or a value that isn’t an invoice number in the table the whole thing will break. To handle errors using the technique I blogged about here, alter the formula in D2 to:

="{"&
TEXTJOIN(
",",
TRUE,
LET(
MemberExpression,
"[Sales].[Invoice Number].[Invoice Number].&["&InvoiceNumbers&"]",
"IIF(ISERROR(STRTOMEMBER("""&MemberExpression&""")), {}, STRTOMEMBER("""&MemberExpression&"""))")
) &"}"

You can download the example workbook here (although it may not work unless you’ve got a version of Excel with dynamic arrays enabled).

How Defining Too Many Measures In A Live Connection Report Can Affect Power BI Query Performance

You probably know that it’s a best practice to build your Power BI datasets in a separate .pbix file from your reports – among other things it means that different people can develop the dataset and reports. You may also know that if you are building a report in Power BI Desktop with a Live connection to a published dataset or Azure Analysis Services you can define your own measures inside the report. While this is very convenient, if you create too many measures there’s a price to pay in terms of query performance.

To illustrate this, let’s say you have a super-simple dataset published to the Power BI Service (or a database in Analysis Services Tabular or Azure Analysis Services) that contains one table with three rows in it, two columns and a simple measure:

If you open Power BI Desktop and create a Live connection to this dataset, you can create a new measure in the normal way and then use it in a table like so:

If you take a look at the DAX query that is generated by this table visual you’ll notice that the MyReportMeasure measure, defined in the report, is defined at the top of the query while the Sales Amount measure, defined in the dataset, is not:

DEFINE
    MEASURE 'Sales'[MyReportMeasure] = ( 
    [Sales Amount] + 1 
    )
    VAR __DS0Core =
        SUMMARIZECOLUMNS (
            ROLLUPADDISSUBTOTAL (
                'Sales'[Product],
                "IsGrandTotalRowTotal"
            ),
            "Sales_Amount", 'Sales'[Sales Amount],
            "MyReportMeasure", 'Sales'[MyReportMeasure]
        )
    VAR __DS0PrimaryWindowed =
        TOPN (
            502,
            __DS0Core,
            [IsGrandTotalRowTotal], 0,
            'Sales'[Product], 1
        )
EVALUATE
__DS0PrimaryWindowed
ORDER BY
    [IsGrandTotalRowTotal] DESC,
    'Sales'[Product]

Here’s what DAX Studio’s Server Timings shows about this query when it runs on a cold cache:

As you would expect it’s pretty quick, taking just 16ms.

In this example MyReportMeasure is something known as a query-scoped measure: it is created when the query runs and ceases to exist when the query finishes. The problem with this is that creating a query has some costs associated with it: for example, Power BI/Analysis Services needs to do some dependency analysis to find out what other measures it refers to, and the more other measures there are, the longer this takes.

To show the impact I generated the DAX definition of 3000 measures in Excel and pasted them into the DEFINE clause of the query above:

[NB this is not exactly what happens in the real world: only the measures you need for a query, and the measures that these measures depend on, are defined in the query but the dependendency analysis happens all the same]

Here’s what Server Timings showed for the same query – which, remember, does not actually used any of the 3000 measures that I added:

Now 3000 measures might seem excessive but I have seen people with that many: you could have 100 base measures and then 30 combinations of different KPIs (time intelligence calculations, financial calculations like actual vs forecast and so on). My advice would be to use calculation groups instead of creating so many measures, if you can – they will be a lot easier to develop and maintain, and for anyone developing a report to use. It’s also worth making clear that this problem only happens with query-scoped measures: no dependency analysis takes place at query time with measures defined on the dataset.

Also 1.5 seconds might not seem a big overhead but if you’re trying to squeeze all the performance you get out of a query, or trying to understand what’s contributing to the overall performance of your query, this is good to know about.

[Thanks to Jeffrey Wang for providing the information in this post]

Excel Cube Functions, Dynamic Arrays And Lambdas, Part 3: Grouping And Histograms

In the last post in this series I showed how you can use Excel’s new Lambda helper functions to return tables. In this post I’ll show you how you can use them to return a dynamic array of CubeSet functions which can be used to build a histogram and do the kind of ABC-type analysis that can be difficult to do in a regular Power BI report.

For the examples in this post I added some rows to the Excel Data Model table that I’m using to hold my source data:

The aim here is to put these products into an arbitrary number of groups, or buckets, based on their sales. To define these buckets I created another Excel table called Buckets that has three columns: the name of the bucket, and the lower bound and the upper bound of the sales amount that determines whether a product should fall into the bucket:

I then created two dyanmic array formulas using the new Map function. In cell G2 I added this formula:

=
MAP(
 Buckets[Bucket Name], 
 Buckets[Lower Bound], 
 Buckets[Upper Bound], 
 LAMBDA(
  n,
  l,
  u, 
  CUBESET(
   "ThisWorkbookDataModel", 
   "FILTER([Sales].[Product].[Product].MEMBERS, [Measures].[Sales Amount]>=" & l & 
   " AND [Measures].[Sales Amount]<=" & u & ")", 
   n)
  )
)

And in cell H2 I added this formula:

MAP(
 G2#, 
 LAMBDA(
  s, 
  IF(
   CUBESETCOUNT(s)>0, 
   CUBEVALUE(
    "ThisWorkbookDataModel", 
    s, 
    "[Measures].[Sales Amount]"),
   0)
  )
)

Here’s what these two formulas return:

The formula in G2 takes three arrays – the values from the three columns in the Buckets table – and then loops over the values in those columns and uses the CubeSet function to return a set of the Products whose sales are between the lower and upper bounds. Since there are two rows in the Buckets table, this formula returns two sets. The formula in H2 uses the CubeValue function to return the aggregated sales amount for each set.

Last of all I created a column chart bound to the values in G2 and H2. This was a bit tricky to do, but I found the answer in this video from Leila Gharani – you need to create names that return the contents of the ranges G2# and H2# and then use the names in the chart definitions.

The beauty of all this is what when you edit the ranges in the Buckets table in the top left of the worksheet, edit the names of the buckets or add new buckets, the table and chart update automatically.

After doing all this I realised there was another, probably easier way to achieve the same result without using the Map function. All I needed to do was to add new calculated columns to the bucket table to return the sets and values:

Here’s the formula for the Set column in the table above:

=CUBESET(
"ThisWorkbookDataModel", 
"FILTER([Sales].[Product].[Product].MEMBERS, [Measures].[Sales Amount]>=" & 
[@[Lower Bound]] & 
"AND  [Measures].[Sales Amount]<=" & 
[@[Upper Bound]] & 
")", 
[@[Bucket Name]] & 
" set"
)

…and here’s the formula for the Sales column in that table:

= IF(
CUBESETCOUNT(
[@Set])>0, 
CUBEVALUE(
"ThisWorkbookDataModel", 
[@Set], 
"[Measures].[Sales Amount]"
),
0
)

I think this second approach should work with any version of Excel since the introduction of tables and cube formulas.

Excel Cube Functions, Dynamic Arrays And Lambdas, Part 2: Returning Tables

In the first post in this series I showed how to use the new Excel Lambda helper functions to return an array containing all the items in a set. That isn’t very useful on its own, so in this post I’ll show you how to generate an entire dynamic table using Excel cube functions and Lambda helper functions.

In this post I’ll be using the same source data as in my previous post: a table containing sales data with just two columns.

With this table added to the Excel Data Model/Power Pivot, I created two measures:

I then created created two sets using CubeSet containing the sets of Products (in cell B2 of my worksheet) and Measures (in cell B4) to use in my table:

=CUBESET("ThisWorkbookDataModel", "[Sales].[Product].[Product].MEMBERS", "Product Set")

=CUBESET("ThisWorkbookDataModel", "{[Measures].[Sales Amount], [Measures].[Forecast Sales]}", "Measure Set")

Here are the formulas shown in the worksheet:

And here’s the output – remember you only see the text in the third parameter displayed in the cell:

Now, here’s the fun part – a single formula that takes these sets and builds a table with the Measures on columns and the Products on rows:

=MAKEARRAY(
  CUBESETCOUNT(B2)+1,
  CUBESETCOUNT(B4)+1,
  LAMBDA(r,c,
   SWITCH(
    TRUE(),
    AND(r=1,c=1),
    "",
    c=1,
    CUBERANKEDMEMBER("ThisWorkbookDataModel",$B$2,r-1),
    r=1,
    CUBERANKEDMEMBER("ThisWorkbookDataModel",$B$4,c-1),
    CUBEVALUE("ThisWorkbookDataModel",
     CUBERANKEDMEMBER("ThisWorkbookDataModel",$B$2,r-1),
     CUBERANKEDMEMBER("ThisWorkbookDataModel",$B$4,c-1)
    )
   )
  )
)

Here’s what this formula returns:

How does this work? Going through the MakeArray function step-by-step:

The first two parameters specify that the output will be an array with one more row than there are items in the Product set and one more column than there are items in the Measures set.
The third parameter returns a Lambda that is called for every cell in this array. This Lambda contains a Switch with the following conditions:
- For the top-left cell in the array, return a blank value
- In the first column, use the CubeRankedMember function to return the Products on the rows of the table
- In the first row, use the CubeRankedMember function to return the Measures on the columns of the table
- In the body of the table, use the CubeValue function to return the values

Here’s a slightly more ambitious version that returns the same table but adds a total row to the bottom:

=
LET(
 NumberOfRows,
 CUBESETCOUNT(B2)+2,
 NumberOfColumns,
 CUBESETCOUNT(B4)+1,
 MAKEARRAY(
  NumberOfRows,
  NumberOfColumns,
  LAMBDA(r,c,
   SWITCH(
    TRUE(),
    AND(r=1,c=1),
    "",
    AND(r=NumberOfRows,c=1),
    "Total",
    r=NumberOfRows,
    CUBEVALUE("ThisWorkbookDataModel",
     $B$2,
     CUBERANKEDMEMBER("ThisWorkbookDataModel",$B$4,c-1)),
    c=1,
    CUBERANKEDMEMBER("ThisWorkbookDataModel",$B$2,r-1),
    r=1,
    CUBERANKEDMEMBER("ThisWorkbookDataModel",$B$4,c-1),
    CUBEVALUE("ThisWorkbookDataModel",
     CUBERANKEDMEMBER("ThisWorkbookDataModel",$B$2,r-1),
     CUBERANKEDMEMBER("ThisWorkbookDataModel",$B$4,c-1))
    )
   )
  )
)

Two extra things to note here:

This is a great example of a complex formula where the new Excel Let function can be used to improve readability and prevent the same value being evaluated twice.
The values in the Total row are calculated in the Excel Data Model, not on the worksheet, by using the CubeSet function inside the CubeValue function. This means that the totals will be consistent with what you see in a PivotTable and therefore correct

This is still very much a proof-of-concept. I need to look at the performance of this approach (it may not be optimal and may need tuning), and I’m not sure how a table like this could be formatted dynamically (especially the Total row). It is exciting though!

Excel Cube Functions, Dynamic Arrays And Lambdas, Part 1: Getting All The Items From A Set

After my recent post on using Office Scripts and cube functions to generate Excel reports from Power BI data, Meagan Longoria asked me this question on Twitter:

To which I can only reply: guilty as charged. I have always loved the Excel cube functions even though they are maybe the least appreciated, least known and least used feature in the whole Microsoft BI stack. They have their issues (including sometimes performance) but they are great for building certain types of report in Excel that can’t be built any other way.

Anyway, the recent addition of new Lambda helper functions to Excel has made me particularly happy because they can be used with cube functions to overcome some limitations that have existed since cube functions were first released in Excel 2007, and to do some other cool things too. In this series of posts I’m going to explore some of the things they make possible.

Let’s start with something simple. In Excel, the CubeSet function can be used to return an (MDX) set of items. This set is stored in a single cell, though, and to extract each item into a cell on your worksheet you need to use the CubeRankedMember function. For example, let’s say I have a table called Sales on my worksheet:

…that is then loaded into the Excel Data Model (aka Power Pivot – although this works exactly the same if I use a Power BI dataset, Azure Analysis Services or SQL Server Analysis Services as my source):

What you can then do is use the CubeSet function to create a set of all the products like so:

=CUBESET("ThisWorkbookDataModel", "[Sales].[Product].[Product].MEMBERS", "Product Set")

…and then use the CubeRankedMember function to put each individual item of the set into a cell. Here’s a simple example worksheet, first with the formulas showing and then the results:

This example shows the fundamental problem that has always existed with CubeRankedMember though: in order to show all the items in a set you need to know how many items there are in advance, and populate as many cells with CubeRankedMember formulas as there are items. In this case see how the range B4:B6 contains the numbers 1, 2 and 3; these numbers are used in the formulas in the range C4:C6 to get the first, second and third items in the set.

If a fourth product was added to the table, however, it would not appear automatically – you would have to add another cell with another CubeRankedMember formula in it manually. I’ve seen some workarounds but they’re a bit hacky and require you to know what the maximum possible number of items in a set could ever be. Indeed that’s always been one of the key differences between cube functions and PivotTables: cube functions are static whereas PivotTables can grow and shrink dynamically when the data changes.

The new MakeArray function in Excel provides a really elegant solution to this problem: you can now write a single formula that returns a dynamic array with all the items in the set in. Assuming that the same CubeSet exists in B2 as shown above, you can do the following:

=MAKEARRAY(CUBESETCOUNT($B$2), 1, LAMBDA(r,c,CUBERANKEDMEMBER("ThisWorkbookDataModel",Sheet3!$B$2,r)))

Here’s the output:

Notice how the formulas in cell B4 returns an array that contains all three items in the set into the range B4:B6.

How does this work?

The CubeSetCount function is used to get the number of items in the CubeSet in B2.
The MakeArray function is then used to create an array with the number of rows returned by CubeSetCount and one column
In the third parameter of MakeArray the Lambda function is used to return a function that wraps CubeRankedMember, which is then called with the current row number of the array

The nice thing about this is that when more products are added to the Sales table they automatically appear in the output of the MakeArray formula in B4. So, for example, with two more products added to the Sales table like so:

Here’s the new output of the formula, showing the two new products returned in the array automatically:

This is not very useful on its own though. In my next post I’ll show you how this can be used to build a simple report.

Power BI Dataset Refresh, Column Encoding And The First Partition

If you’ve been following some of my recent posts about improving Power BI refresh performance by partitioning tables you will have seen a lot of screenshots that look like the one below:

It’s a visualisation from a report created by my colleague Phil Seamark (as detailed in this blog post) showing how long all the partitions in a dataset take to refresh. If you look at these visualisations you’ll probably ask the same question I did: why does the first partition always start before the others?

It turns out this is because when a table is refreshed, the first thing that has to happen is that a certain amount of data is read so the type of encoding (Value or Hash) used for each column is determined. In most cases tables only contain one partition so it’s not obvious that this is happening, but when a table has more than one partition this happens only for the first partition – which explains why the first partition seems to start before the others. You can’t avoid it happening but you can reduce the impact a little by using encoding hints (see here and here for more details): this process can be skipped for columns that have a Hash encoding hint, or which the engine knows in advance have to use Hash encoding, although it cannot be skipped for columns that have a Value encoding hint. What’s more the Execute SQL event for the first partition will have to complete before the Execute SQL events for all the other partitions can start.

[Thanks to Akshai Mirchandani for the information in this post]

Sample HTTP Endpoint For Querying Azure Analysis Services And Power BI Premium

I’ve always thought it would be useful if we could run queries against AAS or Power BI via a simple REST API and I’ve just come across a really nice sample project created by one of my colleagues at Microsoft, David Browne, that allows you to do just that. You can find it here:

https://github.com/microsoft/azure-analysis-services-http-sample

From the ReadMe:

This sample is a HTTP proxy for XMLA endpoints, intended for use with Power BI Premium or Azure Analysis Services.

It’s implemented as ASP.NET Core 5 API Project. The main API is /api/Query which allows you to POST a DAX query and receive the results as a JSON result.

The sample is coded to pass-through authentication from the client to the XMLA endpoint. So to call the API either use HTTP BASIC auth over HTTPS, passing credentials with the request. It’s highly advised that this be a Service Principal, rather than an AAD user. To specify a Service Principal use a UserName of the form app:[ClientID]@[TenantID], and pass a Client Secret as the Password.

For better security, instead of passing a UserName/Password using HTTP BASIC auth, fetch a Bearer token for your XMLA endpoint. To fetch a token use the Resource ID https://analysis.windows.net/powerbi/api for Power BI, or https://*.asazure.windows.net for Azure Analysis Services

I can think of all kinds of cool stuff you could do with this…

Visualising Azure Analysis Services Processing Tasks With The Job Graph Events Sample

I’ll leave it to other people to get worked up about things like Power BI Premium gen2 – what really gets me excited is a new Profiler event type. And indeed there is a new one to play with – Job Graph – that you can use to visualise Azure Analysis Services processing jobs (it’s not enabled yet for the Analysis Services engine in Power BI).

More details about it, and how it can be used, are in the samples here:

https://github.com/microsoft/Analysis-Services/tree/master/ASJobGraphEvents

The data returned by the Job Graph event isn’t intelligible if you look at the text it returns in Profiler. However if you save a .trc file with Job Graph event data to XML you can use the Python scripts in the GitHub repo to generate DGML diagrams that can be viewed in Visual Studio, plus Gantt charts embedded in HTML. Of course to do this you’ll need to have Python installed; you’ll also need to have Visual Studio and its DGML editor installed (see here for details).

Here’s the DGML diagram for the refresh of a very simple AAS database with just one table in it, called People:

The eight nodes in a semi-circle around the top of the diagram represent the eight jobs that process the attribute hierarchies in the table. These eight jobs must be completed before the single partition in the table is processed (this is the node in the centre of the diagram) and that partition must be processed before the table itself is processed (this is the node at the bottom of the diagram).

Since this is the annotated output you can hover over a node in the diagram and see more details about it, including its duration and how long it was blocked for:

Notice how three nodes have dark backgrounds? These nodes represent the critical path. From the sample documentation:

When the job graph executes, there is always a job that finishes last before the engine can commit the change. This job that finishes last is the “critical dependency” for the commit; the entire commit needs to wait for this one job to finish before executing. This last job depends on other jobs, one of which finished after all the others. This is the next critical dependency. Tracing this path of critical dependencies forms the critical path, which helps engineers and customers identify why processing takes so long.

…

To diagnose slow refresh times, look for the critical path and start at the top. Look at “Blocked duration”, “Waiting duration”, and “Running duration”. If a job has a long blocked duration, it spent a long time waiting on other jobs. If a job has a long waiting duration, it was waiting for an available thread, so increasing maxParallelism could help. If a job has a long running duration, then the model might have to be changed to speed up the job.

Here’s another diagram, generated from a slightly more complex database with four tables, a calculated column and a calculated table:

There’s also a script that takes this data and generates a Gantt chart from it, but at the time of writing it seems to be a bit buggy; I can make it work if I comment out some code but I suspect that means the output isn’t accurate. Hopefully it will get fixed soon, but in the meantime here’s an example of what it produces:

I think this is going to be really useful when I’m troubleshooting processing performance problems. It would also be really cool to have this in DAX Studio (hint, hint)…

Visualising Power BI Premium And Azure Analysis Services Query Parallelism

In my last post I showed how to connect SQL Server Profiler up to a Power BI Premium dataset but I didn’t give you any examples of why this might be useful. In this post I’ll show you how you can use a Profiler trace to visualise all the queries run by a Power BI report, see when they start to run, see which ones run in parallel with each other and see what the overall time taken to run all the queries is.

Why is this important? When you’re tuning the performance of a Power BI report the first thing to do is to look at the performance of the individual DAX queries run and make them run as fast as possible. However when a Power BI report is rendered any one query is likely to be run at the same time as several other queries run for the same report, and this will have an impact on its performance. How much of an impact there is will depend on how many queries need to be run and the number of back-end v-cores available on your Premium capacity, or the number of QPUs available on your Azure Analysis Services instance if you’re using a Live connection to AAS. The more v-cores/QPUs you have available, the more of the work needed for a query that can be run in parallel; you can see a table listing the number of v-cores for each Premium SKU here, and the number of QPUs for each Azure Analysis Services SKU here. As a result of this if you have reports with a large number of visuals that generate slow DAX queries, scaling up your Power BI Premium capacity or AAS instance may improve overall report performance. Reducing the number of visuals on your report and/or reducing the number of visuals needed to display the same information will also reduce the number of queries that need to be run and therefore improve overall performance.

As I showed last week, SQL Server Profiler can be used to create a trace that logs all the queries run against a Power BI Premium dataset in the same way as it can be used with Azure Analysis Services. Assuming that you have a trace running that uses only the Query End event, this will give you a list of all the queries that are being run along with their start time, end time, duration and a lot of other interesting information. A table with all this data in can still be difficult to interpret though, so I built a Power BI template for a report that visualises all these queries and helps you understand the amount of parallelism that is taking place. You can download the template file here.

To use it, first you need a trace file. Make sure that no-one else is running reports on the Premium capacity you want to test (creating a Power BI Embedded capacity for testing purposes is a good idea) and then, when the trace is running, refresh your report using the technique I described in the “Use the network tab” section of this blog post. This will also allow you to correlate what you see in the trace with the information you see in the DevTools tab in the browser.

Then save the trace file you can created to XML by going to File/Save As/Trace XML File:

SaveToXML

Next, open the Power BI template file and when prompted, enter the full path of the trace XML file you just created:

TemplateOpening

A new Power BI report will then be created. If you want to point the report to a different trace XML file all you need to do is change the value of the TraceXMLFile Power Query parameter.

On the first page you’ll see the name of the trace XML file you connected to plus a bar chart showing each Query End event (with each query identified by a number) on the y axis and the duration of each query on the x axis:

Waterfall

It’s not quite a simple bar chart though. What I’ve done is:

Found the start time of the first query run
Calculated the start time of every other query in the file relative to this first start time (although, unfortunately, Profiler only gives you start times rounded to the nearest second which means you can’t know exactly when a query starts)
Created a stacked bar chart where the first value in the stack is this relative start time and the second value is the duration of the query in seconds
Made the colour of the relative start time transparent, so you only see the blue sections of the bar for the query durations. This gives you a waterfall-like effect and allows you to see which queries are run in parallel. This also makes it easy to see the total amount of time taken to run your queries, from the start of the first query to the end of the last query, which is just as useful to know as the duration of any single query.
There’s also a drillthrough page so you can right-click on a bar and see a table with the DAX query for the query you clicked on, as well as its start time and duration.

It’s a very basic report, I know, and I would be interested to know if you have any ideas about other ways of visualising this data. What’s more, a visual like this raises more questions than I know how to answer… yet. For example, one thing I want to investigate is the effect that query interleaving has on this graph and both perceived and actual report performance. So stay tuned for more blog posts on this subject!

Category: Azure Analysis Services

Reducing The Number Of Queries Generated By Excel Cube Function Reports On Power BI, Analysis Services And Power Pivot

Like this:

Filtering An Excel Cube Function Report By A List Of Manually-Entered Values

Like this:

How Defining Too Many Measures In A Live Connection Report Can Affect Power BI Query Performance

Like this:

Excel Cube Functions, Dynamic Arrays And Lambdas, Part 3: Grouping And Histograms

Like this:

Excel Cube Functions, Dynamic Arrays And Lambdas, Part 2: Returning Tables

Like this:

Excel Cube Functions, Dynamic Arrays And Lambdas, Part 1: Getting All The Items From A Set

Like this:

Power BI Dataset Refresh, Column Encoding And The First Partition

Like this:

Sample HTTP Endpoint For Querying Azure Analysis Services And Power BI Premium

Like this:

Visualising Azure Analysis Services Processing Tasks With The Job Graph Events Sample

Like this:

Visualising Power BI Premium And Azure Analysis Services Query Parallelism

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this: