Troubleshooting Web Service Refresh Problems In Power BI With The Power Query Diagnostics Feature

Back in 2018 I wrote a blog post about how to troubleshoot web service refresh issues using a third party tool called Fiddler. It’s still relevant but Fiddler is a complex tool and installing it and giving it the right permissions to see everything it needs to see isn’t easy. Luckily, the new Power Query Diagnostics feature in Power BI (released in October 2019) means you don’t need to use Fiddler for this kind of work any more. Full details on how to use this feature are given here, but I thought it would be useful to focus on one particular use case for it.

Let’s take one of the Power Query queries from my previous post as an example:

let
    Source = OData.Feed("https://api.parliament.uk/odata"),
    GovernmentOrganisation_table =
    Source{[
        Name="GovernmentOrganisation",
        Signature="table"
    ]}[Data],
    #"Filtered Rows" =
    Table.SelectRows(
        GovernmentOrganisation_table,
        each ([GroupName] = "Cabinet Office")
    )
in
    #"Filtered Rows"

[Note that this is a public web service which requires no authentication, so you will be able to run this query yourself]

This query connects to the UK Houses of Parliament OData API and returns the table of government organisations filtered down to the row where the GroupName column equals the text “Cabinet Office”. What requests are made to the web service when this query runs?

The first thing to point out is that different things will happen depending on whether you refresh the preview window in the Power Query Editor or refresh the dataset inside the main Power BI window. If you’re doing performance tuning you should ignore what happens when you refresh the preview in the Power Query Editor and always refresh from the main Power BI window.

The second is that, when you refresh a dataset (or a table within a dataset) from the main Power BI window, you might find that more work is being done than you expect as a result of “background refresh” taking place. This is something I blogged about here and you should always turn it off before doing any performance tests, because it’s not something that will happen in the service and it can also slow things down – potentially a lot if there are lots of queries and steps. You can turn it off from the Options dialog in the Data Load tab by deselecting “Allow data preview to download in the background”.

image 

The third is that the Query Diagnostics feature I’m going to use is, at the time of writing, in Preview, so you may need to enable it in the Preview features pane of the Options dialog:

image

With that done, here are the steps to find out what requests are made to the web service when the above query is run.

  1. First, let the query load data into the dataset as usual without doing anything else. You’ll see the table that the query has loaded data into in the Fields pane of the main Power BI window:
    image
  2. Next, reopen the Power Query Editor window, go to the Tools tab and click on the Start Diagnostics button.
    image
  3. Do not do anything else in the Power Query Editor window, instead minimise (do not close) it and then go back to the main Power BI window. There, right-click on the table that holds the output of the query in the Fields pane, right-click and select “Refresh data”:
    image
  4. Once the refresh has taken place, go back to the Power Query Editor window and click the Stop Diagnostics button:
    image
  5. When you have done that, you’ll see two new queries created in a group called Diagnostics:
    image
    These queries contain the diagnostics information for the refresh you’ve just done. The query with “Detailed” in its name has all the detailed information; the other query contains summarised data.
  6. Looking at the “Detailed” query, each row represents a single operation that takes place within the Power Query engine when the refresh took place and while I don’t know what all this means (I’m working on it!), if you go to the Data Source Query column you’ll see a list of all the HTTP requests made to the web service:
    image
  7. Clicking on an individual cell in this column allows you to see the full text of the request and response made in the preview pane underneath the results pane:
    image

While Fiddler may still be necessary for some more advanced scenarios, this is a much easier way of troubleshooting web service data sources in the Power Query Editor, for example when you need to check the exact request made to see if query folding is taking place on an OData data source.

Excel Dynamic Arrays And The CubeSet Function

Among all the exciting announcements made at Ignite last week, one you may have missed (even in all the Excel-related announcements here) was that dynamic arrays have finally reached GA. Ever since they were announced I’ve been interested in how they can be used with Excel cube functions, which allow you to get data from Analysis Services, Power BI and the Excel Data Model/Power Pivot into cells on the worksheet, and I’ve blogged about this once already. Even though right now the CubeValue function can’t be used with dynamic arrays – alas – there are still some interesting ways dynamic arrays and cube functions can be used together and in this post I’ll show you another one I’ve found.

Say you have the following set of tables in the Excel Data Model:

image

And the following measure that counts the number of customers who bought something:

Distinct Customers:=DISTINCTCOUNT('Internet Sales'[CustomerKey])

Let’s also say you have on a worksheet the following table of data showing the number of distinct customers broken down by product, created using the CubeMember and CubeValue functions:

image

Here are the formulas for this table:

image

So far, all very straightforward. Now let’s say you want to use a dynamic array to filter the rows in this table so you only see the products that have more than a certain number of distinct customers, and you want to see a grand total for this filtered list of customers, like so:

ExcelDynamicArray

Here are the formulas to achieve this:

image

To explain what’s going on here:

  • Cell C2 contains the text “ThisWorkbookDataModel”, the name of the connection to the Excel Data Model. Doing this makes the other Excel formulas here much easier to read!
  • Cell F2 – which I’ve turned into a named range called FilterThreshold – contains the number of customers which a product must exceed to be shown.
  • Cell E5 contains the dynamic array Filter function that filters the rows shown in the first two screenshots above and only returns the products that exceed the threshold. The formula is:
    FILTER(B5:B134,C5:C134>FilterThreshold)
  • Cell E6 does something similar to E5 but returns the number of distinct customers for each product. If all I wanted was the products and the number of distinct customers I wouldn’t have needed two separate formulas, I could have used one, but I need to add the grand total too…
  • Cell E4 contains the formula:
    CUBESET($C$2, E5#, “Total”)
    This creates a set from the spill reference E5#, which contains the cells containing CubeMember formulas returned by the Filter function. For some reason CubeSet returns an error if you try to use a dynamic array formula in its second parameter but using a spill reference works ok; this is why I had to use separate formulas in E5 and E6.
  • Finally, cell F4 contains the formula that returns the total distinct count for all products returned in the filtered list using CubeValue and the set created in cell E4:
    CUBEVALUE($C$2, “[Measures].[Distinct Customers]”, E4)
    Since this is a distinct count the only way to get the correct value here is using CubeSet, CubeValue and the DAX measure – there would be no way to calculate the correct value using Excel formulas. The same could be said of almost any other DAX measure that wasn’t a simple aggregation.

You can download the sample workbook for this post here.

Power BI Report Builder And RSCustomDaxFilter

If you’re building DAX queries using Power BI Report Builder you might notice something that looks like a new DAX function called RSCustomDaxFilter. For example, here’s a simple DAX query built from the Adventure Works Tabular demo database, with one measure and one parameter built Calendar Year:

image

Note that the “Enable Multi Value Parameters” option has been selected. Here’s what you’ll see if you view the text of the DAX query in the Query Designer:

EVALUATE 
SUMMARIZECOLUMNS(
RSCustomDaxFilter(
 @DateCalendarYear,
 EqualToCondition,
 [Date].[Calendar Year],
 Int64
 ), 
"Internet Total Sales", 
[Internet Total Sales]
)

But what is RSCustomDaxFilter? If you run your report and see what happens on the server using SQL Server Profiler, here’s the query that actually gets run (in this case I selected the years 2013 and 2014 for the parameter):

EVALUATE
SUMMARIZECOLUMNS (
    FILTER (
        VALUES ( 'Date'[Calendar Year] ),
        ( 'Date'[Calendar Year] = VALUE ( "2013" ) )
            || ( 'Date'[Calendar Year] = VALUE ( "2014" ) )
    ),
    "Internet Total Sales", [Internet Total Sales]
)

What has happened is that RSCustomDaxFilter has been replaced with an expression using the DAX Filter() function that implements the filter on the selected years; it’s just a placeholder for a dynamically-generated DAX expression that is substituted in at runtime.

Why is it needed? Handling multi-value parameters is difficult in DAX when you don’t know how many values are going to be passed to the parameters (it’s a subject I’ve blogged about here and here) and some kind of dynamic code generation is a reasonable solution to this problem.

There is one drawback with this approach though – it can generate a DAX query that is too long to be executed. Here’s the error message you’ll see:

image

The specified query is too complex to be evaluated as a single statement.

I’m not sure what the maximum length of a query is in DAX – I suspect it’s 32768 characters. You’ll hit this limit if you create a parameter on a field with thousands of values in and then select all but a few of these values; from the example above you can imagine how long the resulting DAX query could be:

image

There’s no single workaround for this but some custom DAX (such as the example here) will be necessary; you’ll just need to ensure that the DAX query is as short as possible.

Excel PivotTable Query Performance With Analysis Services, Power Pivot And Power BI (Revisited)

Back in 2016 I wrote the following blog post about changes to the way Excel 365 generated MDX queries for PivotTables connected to Analysis Services, Power Pivot/the Excel Data Model and Power BI datasets:

https://blog.crossjoin.co.uk/2016/07/08/excel-2016-pivottable-mdx-changes-lead-to-big-query-performance-gains/

I know it sounds boring and not something you need to worry about but trust me, this is important – these changes solved the vast majority of Excel PivotTable performance problems that I encountered when I was a consultant so you should read the above post before continuing.

Unfortunately, earlier this year these changes had to be partially rolled back because in some rare cases the queries generated returned incorrect results; this means that you may find that values for subtotals and grand totals are again being returned even when they aren’t being displayed. The good news is that you should still be able to get the improved performance with a few minor tweaks.

Using the example from the previous post, a PivotTable connected to SSAS MD and the Adventure Works cube (the queries generated for SSAS Tabular, Power Pivot and Power BI may be slightly different and slightly better but the basic problem is the same), if you build the following in Excel:

image

image

…even though the subtotals in the PivotTable are not displayed in Excel, the MDX query generated by Excel not only returns them but also returns others that are not needed at all – in fact 36 rows (although the query returns them as columns) of data rather than the 13 rows that are displayed. Here’s a screenshot of the results returned when you run the MDX query in SQL Server Management Studio:

image

image

Any time you see a row or column containing an All Member (in this case All Customers or All Products) you know you are looking at a subtotal or grand total.

[NB The easiest way to get the MDX query generated by your PivotTable is to install the OLAP PivotTable Extensions add-in https://olappivottableextensions.github.io/]

In this case, to get an efficient query, you need to explicitly turn off subtotals and grand totals for the PivotTable in Excel:

image

image

In the PivotTable the only difference you’ll see is that the grand totals are now not displayed:

image

But the query generated now only returns the values that are actually needed, and as a result will be a lot more efficient and potentially a lot faster. In this case, the query now only returns the 12 rows of data displayed:

image

image

Even with subtotals and grand totals turned off there are still some cases where unwanted values will be returned. Take the following PivotTable, where I have used the Calendar hierarchy from the Date dimension on columns and filtered it so I only see the three months in Q1 CY 2012:

image

image

The MDX query generated by Excel requests requests four extra columns with subtotals and a grand total that aren’t displayed:

image

There is a fairly simple workaround though. By changing how the PivotTable is constructed – in this case by not putting a hierarchy on columns but just the Month Of Year attribute, then adding slicers for Calendar Year and Calendar Quarter to control which months are displayed – you can get an efficient query. This version of the PivotTable:

image

image

…results in this query being generated, which only returns the required values:

image

To sum up, then, if you have a PivotTable that seems a bit slow:

  • Turn off subtotals and grand totals unless you really need to see them
  • Look at the MDX query being generated by Excel and see if it is still returning unnecessary subtotals and grand totals, and if it is try rebuilding the PivotTable to get the same results in a different way (for example by not drilling down on hierarchies as in the last example) to get a more efficient query

[All the queries in this post were generated by Excel 365 version 1910 build 12130.20238]

Introduction To M In Power Query And Power BI

Since last week’s post proved very popular (it turns out people want video – who knew?) I thought I’d post the recording of a presentation I did earlier this year at the Power BI User Group in Wellington, New Zealand, called “Introduction to M in Power Query and Power BI”. It’s a session I’ve presented at various conferences and user groups over the years but this is the best version of it I think:

It’s an hour-long introduction to the M language that sits behind Power Query in Power BI, Excel, dataflows, Analysis Services and other tools. It covers pretty much everything you need to now about the language including let expressions; tables, records and lists; error handling; and writing custom M functions.

If you’re keen to watch even more Power BI-related videos, my colleague Matthew Roche has just started his own YouTube channel and published a video on dataflows vs datasets.

Why Does Power BI Query My Data Source More Than Once?

This week I was honoured to be a guest on the Guy In A Cube channel, and for my topic I decided to tackle one of life’s eternal questions: why does Power BI query my data source more than once when I refresh my dataset?

You can watch the video here:

Although I’ve never answered this question directly in a blog post before, nevertheless almost every technique I showed in the video is something I’ve blogged about so I thought it would be useful to collect all the links to these posts in one place to provide some background to what I show in the video.

The first thing to say is that this is really a Power Query question, not just a Power BI question; a lot of what I show will therefore be relevant to Excel’s Get&Transform, Power BI dataflows and every other manifestation of Power Query out there, although I can’t guarantee that all the options and behaviour will be the same in these places.

If you’re developing in Power BI Desktop and you think that refresh is taking a long time, you should definitely check whether the Power Query engine is hitting your data source more than once. There are lots of ways to do this. Some data sources have tools that show when they are queried, such as the Run History screen in Microsoft Flow that I show in the video or SQL Server Profiler. Other ways include using Fiddler for web services or Process Monitor for files.

Next, if you find that Power Query is indeed querying your data source multiple times, the most common explanation is that it’s doing a “background refresh”. This happens by default when you refresh a dataset and is Power Query refreshing all the data previews that you see when you click on a step in the Power Query Editor window. In most cases this is so quick you don’t notice it but when you have lots of queries with lots of steps it can be a gigantic problem. You can turn it off in the Options dialog in Power BI Desktop, as shown in the video, or in the Excel Query Options dialog. I blogged about this issue here.

Another possible reason is the data privacy settings you have applied. This is a gigantic, complex topic and something that I and other people have blogged about many times. I wrote a multi-part series of posts explaining data privacy settings starting here and there’s also a great explanation of what’s going on in behind the scenes written by the dev team that I link to here; I also wrote a post here showing an example of the performance impact of data privacy checks. As I say in the post, however, do not play around with these settings unless you really know what you’re doing.

If you’re ok with writing some M code, using the technique I blogged about here to implement basic query folding on a web data source can be important – in the example in the video, it was doing this that reduced the number of calls to the web service from six to three.

Some transformations can also lead to data being read from a data source more than once. For example, in this post I showed how a merge transformation leads to multiple reads from the same Excel file and how to stop this happening and drastically improve performance.

Moving on, another important lesson to learn is how referenced queries are evaluated. This is one of the most counter-intuitive things about Power Query! With the set of referenced queries used in my demo:

…where three queries called Referenced Query 1, Referenced Query 2 and Referenced Query 3 each reference a query called Call Web Service and are loaded into a dataset (and where Call Web Service is not loaded into the dataset), most people would assume that when a refresh takes place the following happens:

  1. The Call Web Service query is run, getting the data from the web service
  2. The data returned is then passed to Referenced Query 1, Referenced Query 2 and Referenced Query 3

This is wrong. In fact what happens is this:

  1. Referenced Query 1 is refreshed. Because it references the Call Web Service query, that means the Call Web Service query is also evaluated.
  2. Referenced Query 2 is refreshed. Because it references the Call Web Service query, that means the Call Web Service query is also evaluated.
  3. Referenced Query 3 is refreshed. Because it references the Call Web Service query, that means the Call Web Service query is also evaluated.

This explains why the Call Web Service query is still being evaluated three times once the M code above has been added to it. You can read a detailed explanation of how referenced queries are evaluated here.

There is something we can do to help here though: the Power Query engine can cache the results returned from some types of data source, such as web services, but to take full advantage of this caching in this particular scenario you have to turn off the Enable Parallel Loading Of Tables option on the Options dialog as I show here. With that done, each of the three referenced queries run one after the other, instead of in parallel. When the first of these queries runs its calls the Call Web Service query, which gets data from the web service; when the next two queries run, though, and they trigger two more evaluations of Call Web Service, the data for the call to the web service has been cached. This caching can be turned off using the IsRetry option in Web.Contents as Curt Hagenlocher explains in a comment here.

The final thing shown in the video is that the infamous Table.Buffer M function does not improve performance in this case and in fact makes performance worse. If you use Table.Buffer inside the Call Web Service query it will indeed buffer the result of the call to the web service into memory, but the data that is buffered cannot be shared between separate query executions – so the buffering happens three times, once for each time Call Web Service is evaluated by the three referenced queries. Again this is something discussed in this post. Of course there are plenty of other scenarios where Table.Buffer will help performance, but these will only be in cases where the same query requests data from the same data source more than once.

Phew, that was a lot. I don’t pretend to know everything about when and why Power Query gets data from a data source multiple times but this is the sum total of my knowledge right now. I hope it’s useful!

[Update: it’s just been pointed out to me, quite rightly, that the names of my queries are confusing. For example, “Referenced Query 1” is the query doing the referencing, not the query being referenced – which is Call Web Service. I haven’t changed the names because I want the blog post to remain consistent with the video, and I hope the diagrams and the context make everything clear]

Power BI And Case Sensitivity

Case sensitivity is one of the more confusing aspects of Power BI: while the Power Query engine is case sensitive, the main Power BI engine (that means datasets, relationships, DAX etc) is case insensitive. Imke Feldmann shows how to make Power Query case insensitive through custom M here; in this blog post I’m going to concentrate on what case insensitivity means in datasets and DAX and show a way to (kind of) work around it.

Starting with a simple example, let’s say you have a data source that contains data that looks like this:

image

This is what the data looks like in the Power Query Editor which is, as I said, case sensitive – which means that it sees “A” and “a” as different characters, so you see six distinct characters in each row of the table. This could be data from any data source; to make things easy I’ve used the following M query to generate this table:

let
    Source = 
    #table(
        type table 
        [MyTextColumn = text], 
        {
            {"A"},
            {"a"},
            {"B"},
            {"b"},
            {"C"},
            {"c"}
        })
in
    Source

When this data gets loaded into a Power BI dataset, however, you’ll see the following in the Data pane of the main Power BI window:

image

Because the main Power BI engine is case insensitive – so for example “a” and “A” are considered as the same character – when this data is loaded in, it only sees three distinct characters and you can’t be sure whether you’ll get the lower case or upper case character stored twice. This is just the way it works; while an instance of Analysis Services Tabular, which is basically the same engine that is found in Power BI, can be either case sensitive or case insensitive, this option isn’t available in Power BI. There is an idea to support case sensitivity as an option in Power BI here and while there are some valid scenarios where you need this, in my experience case sensitivity causes a lot of problems so I can see why it wasn’t a priority.

What’s more you’ll find that DAX calculations are case insensitive too. If you try the following DAX expression to create a calculated table:

Demo Table = 
DATATABLE(
    "MyTextColumn",
    STRING,
    {
        {"A"},
        {"a"}
    }
)

You’ll get this:

image

DAX measures behave in a similar way. Given the following table:

image

If you use the following measure in a table visual:

Demo Measure = 
IF(
    SELECTEDVALUE('Demo Table 2'[MyIntColumn])=1,
    "A",
    "a"
)

…you’ll see:

image

The only way you can work around this case insensitivity is to make text values that would otherwise look the same to the Power BI engine somehow different. One way of doing this would be to add some extra characters to your text. You might think adding some extra spaces would be the way to go; revisiting the first M query shown above, you could add a space to every lower case character in the table like so:

let
    Source = 
    #table(
        type table 
        [MyTextColumn = text], 
        {
            {"A"},
            {"a "},
            {"B"},
            {"b "},
            {"c "},
            {"C"}
        })
in
    Source

But this doesn’t work because another little-known behaviour of the Power BI engine is that all leading and trailing spaces are trimmed when text is loaded. Anyway, spaces may not be visible but they still take up… well space. A better option – and one that actually works – is to use the Unicode Zero-Width Space character and the M Character.FromNumber function instead like so:

let
    Source = 
    #table(
        type table 
        [MyTextColumn = text], 
        {
            {"A"},
            {"a" & Character.FromNumber(8203)},
            {"B"},
            {"b" & Character.FromNumber(8203)},
            {"c" & Character.FromNumber(8203)},
            {"C"}
        })
in
    Source

The great thing about this character is that although it’s there, it’s invisible and takes no space. When you load this last M query into Power BI you see the following table:

image

Let’s finish off with a more advanced example of how to use this rather excellent character. Say you have the following source data returned by an M query called SourceData:

image

Of course, when this table is loaded into Power BI, you see the following:

image

BUT, the following M query takes this table and for each row goes through the text in the OriginalText column and adds a zero-width space after each lower case character:

let
    Source = 
    SourceData,
    ToList = 
    Table.AddColumn(
        Source, 
        "Chars", 
        each Text.ToList([OriginalText])
        ),
    LowerCaseChars = 
    {"a".."z"},
    AddInvisibleChars = 
    Table.AddColumn(
        ToList, 
        "AddInvisibleChars", 
        each 
        List.Transform(
            [Chars], 
            each 
            if 
            List.Contains(LowerCaseChars, _) 
            then 
            _ & Character.FromNumber(8203) 
            else _
            )
            ),
    RecombineList = 
    Table.AddColumn(
        AddInvisibleChars, 
        "OutputText", 
        each 
        Text.Combine([AddInvisibleChars]), 
        type text
        ),
    RemovedOtherColumns = 
    Table.SelectColumns(
        RecombineList,
        {"OutputText"}
        )

in
    RemovedOtherColumns

When you load the output of this query into Power BI, because the zero-width spaces after each lower case character make Power BI see each piece of text as being different, it looks like case has been preserved even when it hasn’t:

image

You can download the Power BI file for this last example here.

[Thanks to Greg Galloway, Akshai Mirchandani and Jeffrey Wang for their help with this post]

%d bloggers like this: