Troubleshooting Power BI Timeouts, Part 1: The Two/Five Hour Import Refresh Limit

If you’re working with a large Power BI dataset and/or a slow data source in Import mode it can be very frustrating to run into timeout errors after you have already waited a long time for a refresh to finish. There are a number of different types of timeout that you might run into, and in this series I’ll look at a few of them and discuss some of the ways you can work around them.

In this post I’ll look at one of the most commonly-encountered timeouts: the limit on the maximum length of time an Import mode dataset refresh can take. As documented here these limits are:

  • Two hours for an import-mode dataset in Shared capacity
  • Five hours for an import-mode dataset in a Premium or PPU capacity

If you are using Premium you can connect to your dataset using SQL Server Profiler and run a trace when the refresh starts and you’ll see the timeout set as a property of the TMSL command that runs the refresh (18000 seconds = 5 hours):

Here’s an example of the message you will see in the Refresh History dialog if your dataset refresh takes more than five hours in PPU:

Once again you get more details in a Profiler trace. When the refresh times out you’ll see an Error event with long message that starts with the following text:

Timeout expired. The timeout period elapsed prior to completion of the operation.. The exception was raised by the IDbCommand interface. The command has been canceled..

There is a way to work around this limit if you’re using Premium or PPU: it only applies to refreshes (scheduled or manual) that you start from the Power BI portal or via the REST API. If you start your refresh by running a TMSL script via the XMLA Endpoint (for example via SQL Server Management Studio or from PowerShell) then the limit does not apply. This is because you’re executing your own TMSL script rather than having the Power BI Service execute its own refresh command – with a timeout set – like the one shown in the screenshot above.

For example, here’s the same dataset from the screenshot above but refreshed successfully from SQL Server Management Studio (notice the Type column says “Via XMLA Endpoint”) and with a duration of just over five hours:

There are a couple of blog posts out there showing how you can implement a refresh strategy using the XMLA Endpoint; this post from Marc Lelijveld and Paulien van Eijk is a great example of how to do this using Azure Automation and Azure Data Factory.

You should also consider tuning any refresh that takes a long time and it could be that after tuning it you fall well within the two/five hour limit. There are a lot of things to consider when tuning dataset refresh; I did a conference session on this topic last year (you can watch the recording here) covering issues like data modelling, query folding in Power Query and the use of dataflows but it doesn’t cover everything and I have learned a lot even since then. In my opinion one of the most important things you can do to improve refresh performance for very large Import datasets is to partition your fact tables and increase the amount of parallelism, as I describe here. One customer I worked with was able to reduce their refresh time from four hours down to one hour using this technique:

As I said, though, there are many different types of timeout that you may encounter – so even if you refresh via the XMLA Endpoint it may still time out for another reason.

Setting SQL Server CONTEXT_INFO In Power Query

In my quest to check out every last bit of obscure Power Query functionality, this week I looked into the ContextInfo option on the Sql.Database and Sql.Databases M functions. This option allows you to set CONTEXT_INFO in SQL Server (see here for a good article explaining what this is) and here’s an example of how to use it:

let
  Source = Sql.Database(
    "localhost", 
    "AdventureWorksDW2017", 
    [
      Query = "SELECT * FROM DIMDATE", 
      ContextInfo = Text.ToBinary(
        "Hello"
      )
    ]
  )
in
  Source

This Power Query query runs a simple SQL SELECT statement against the SQL Server Adventure Works DW 2017 database. Note that since you need to pass a binary value to the ContextInfo option, in this example I had to use the Text.ToBinary function to convert my text to binary.

Here’s what happens in SQL Server when this Power Query query is run:

Here’s a simple example of how to retrieve this data on the SQL Server side:

SELECT session_id, login_time, program_name, context_info 
FROM sys.dm_exec_sessions
WHERE session_id=57

I’ll leave it to the SQL Server experts to decide what this can be used for and no doubt to complain that it would be more useful to support SESSION_CONTEXT too – although I’ve heard that might already be used for something, so I need to do more research here…

New Book: “Extending Power BI With Python And R”

Back in September I posted about a few new Power BI-related books I was given to review for free; that post led to me getting sent another freebie book, “Extending Power BI with Python and R”, by Luca Zavarella (buy it from Amazon UK here). I found this book particularly interesting because I know very little about Python or R, but I also know that this is a really hot topic for many people and I was curious to know what problems using these languages in Power BI might solve.

I can’t comment on the quality of the Python and R advice (although I’m pretty sure Luca knows what he’s writing about), but from the point of view of a Power BI developer the book does a good job of explaining how using them allows you to do things that are difficult or impossible otherwise. There are chapters on regular expressions, calling APIs, using machine learning models and advanced visualisations. I haven’t seen any other books, videos or blog posts that cover these topics in such detail, so if you have some Python or R skills and want to make use of them in Power BI this book seems to be a good bet.

Power BI/Power Query Data Privacy Settings And Errors Caused By Nested Values

Over the past few years I’ve blogged and presented extensively on the subject of Power Query’s data privacy settings (see here for a post with links to all this content). I thought I knew everything there was to know… but of course I didn’t, and I’ve recently learned about an issue that can cause mysterious errors.

As always it’s easiest to show an example of how it can occur. Here’s a table of airport names taken from the TripPin public OData feed:

Note how the Location column contains nested values of data type Record, and note that I have not expanded this column.

Here’s another query with sales data for these airports coming from Excel:

Now let’s say we want to join these two queries together using a Merge operation in the Power Query Editor. When you do this, because you are combining data from two different sources and because OData supports query folding, you will be prompted to set data privacy settings on these sources (unless you have already done so at some point in the past) because a Merge could result in data being sent from Excel to the OData source.

If you set the data privacy levels to Private on each source, like so:

…you are telling Power Query that it should never send data from these sources to any other source. As a result, Power Query has to load the data from both sources, buffer that data in memory, and do the Merge inside its own engine.

When you do the Merge everything looks normal at first:

But when you expand the Airports column you’ll see that the nested values in the Location column have been turned into the text “[Record]” and as a result can no longer be expanded.

This is because Power Query has had to buffer the values in the Airports query but it is unable to buffer nested values (I wrote about this here).

There are two ways to fix this. First of all, you can change the data privacy settings or turn them off completely. I don’t recommend turning them off completely because this is only possible in Power BI Desktop and not in the Power BI Service, and even changing the data privacy settings can lead to some unexpected issues later on. For example, if you set the data privacy levels for both sources to Public like so:

…then no buffering is needed (because data can now be sent from one source to another) and the nested values in the Location field can be expanded:

…and of course you do so:

However, people always forget that you have to set your data privacy levels again after publishing your dataset to the Power BI Service. And if you or someone else subsequently sets the data privacy levels back to Private you’ll get the following error in the output of the query:

The error message here is:

“Expression.Error: We cannot convert the value “[Record]” to type Record.”

Depending on the data type of the nested field you might get:

“Expression.Error: We cannot convert the value “[Table]” to type Table.”

or

“Expression.Error: We cannot convert the value “[List]” to type List.”

…instead.

The second way to fix the problem is easier and probably safer: you just need to expand the Location column before the Merge operation instead of after it. That way there are no nested fields present when the Merge takes place so all the values can be buffered. Here’s what the Airports table looks like after the Location column has been expanded, before the Merge:

…and here’s the output of the Merge even when the data privacy levels for both sources are set to Private:

Is Power BI’s “Show Data Point As A Table” Feature A Security Hole?

In the last few months the following issue has been escalated up to the Power BI CAT team several times: customers have deployed reports into production and then found that users are able to see data they should not be allowed to see by using the “Show data point as a table” feature. The question is: is this a security hole? It isn’t, and in this blog post I’ll explain why and how you should think about security as something that happens on the dataset and not in the report.

Here’s a simple example of the problem. Say you have a dataset with the following table in it:

It contains sales data but the text in the Comments field is sensitive and should not be visible to everyone. If you have a report with a matrix visual in it, put Employee on columns and drag the Revenue field into values and sum it up (ie create an implicit measure rather than defining an explicit measure) like so:

…then an end user will be able to view the report, select a cell in the visual, right click and select “Show data point as a table” and see a table that contains unaggregated data including some of the fields from the underlying table that go to make up that value – including the Comments field.

Ooops! Of course it’s bad when an end user sees something they shouldn’t but this isn’t Power BI’s fault. As a Power BI developer it’s important to understand that visibility and security are not the same thing and that data security is something that is defined on a dataset and not in a report. You need to use features such as row-level security and object-level security to stop users seeing data they should not be allowed to see – or you should not import that data into your dataset in the first place. You can stop the “Show data point as table” option from appearing by changing the visual you use in your report or by using an explicit measure (ie one defined using a DAX expression), but that’s still not secure and there’s no guarantee that users would not be able to see the same data some other way.

In our example, with object-level security set up to deny access to the Comments field you can be sure that users will not be able to see that data unless they have permission. When viewing the report via a role with OLS defined then the Comments field will not appear when you use “Show data point as a table”:

Understanding The “The operation was cancelled because of locking conflicts” Error In Power BI

If you’re working in Power BI Desktop you may sometimes find that your visuals error with the message “Couldn’t load the data for this visual. The operation was cancelled because of locking conflicts”:

Why is this happening? If you’re an old Analysis Services person like me you may be familiar with the error – the basic problem is the same – but here’s a simple explanation. If you’re making changes to your dataset (for example editing the DAX for a measure) in Power BI Desktop then Power BI has to wait for any DAX queries, that is to say the queries that get the data for your visuals, that are currently running to finish before it can save those changes. However if it has to wait too long to do this then it will kill any queries still running so it can go ahead and commit those changes, and when it does so you’ll see the “locking conflicts” error.

I was able to recreate this error by creating a DirectQuery dataset with a single table based linked to a SQL query that takes one minute to run, building the report shown in the screenshot above, and then creating a new measure when the visual on the left was rendering. Even then it didn’t error consistently – which I guess is a good thing!

Now you know the cause, the next question is what can you do to avoid it? Since the problem is caused by long-running DAX queries the answer is to tune your queries to make them faster. To be honest, if you have queries that are slow enough to cause this error you already have a usability issue with your report – most DAX queries should run for no more than a couple of seconds.

Refreshing Power Pivot Excel Data Models That Use The Current Workbook As A Data Source In Excel Online

Something that got lost in all the excitement around Excel reports connected to Power BI datasets working in Excel Online is the fact that Excel reports connected to the Excel Data Model – aka Power Pivot – now work too (although I did mention it here). Right now it’s not possible to refresh the data stored in the Excel Data Model if you are connected to external data sources. However, today I noticed something nice: if you are using a, Excel table in the same workbook as a source for a table in the Excel Data Model, if you make a change to the table in the worksheet then it is possible to refresh the data in the Excel Data Model.

Here’s an example. In Excel on the Desktop I created a new workbook and added a simple table to a worksheet:

I then moved to the Power Pivot tab on the ribbon and clicked the “Add to Data Model” button to add this table to the Excel Data Model:

I then created a PivotTable connected to the Excel Data Model:

After saving the workbook to OneDrive for Business, I closed it in Excel Desktop and reopened it in Excel Online in the browser:

Finally I was able to change a value in the source table, click the Refresh Selected Connection button on the Data tab in the ribbon, and see the change reflected in the PivotTable connected to the Excel Data Model:

Of course it would be better if you could refresh external data sources too but I still think this could be useful, for example if you had multiple users updating forecasts or budgets in an Excel table in the browser and were using the Excel Data Model for reporting on this data.

Handling “Select All” For Slicers Bound To Dynamic M Parameters In Power BI

Last year I blogged about handling multi-select scenarios for dynamic M parameters. Since then support for “Select All” has been added (see here) but since a regular slicer with no items selected behaves the same as one with all items selected, some extra logic is needed to make a slicer bound to a dynamic M parameter behave like a regular slicer.

To illustrate this, I created a simple dataset with two tables in it:

  1. A DirectQuery table linked to the AdventureWorksDW DimDate table (a simple date dimension table), stored in Snowflake
  2. An Import mode table called Day Name that contains one column and seven rows containing the names of the days of the week

I then created an M parameter called DayParam and bound it to the DayName column of the Day Name table and enabled the Multi-select and Select all options:

Here’s the M code that shows how to use the DayParam parameter to filter the DimDate table on the EnglishDayNameOfWeek column, and handle the scenario where “Select All” has been selected as well as the scenario where no item has been selected:

let
  Source = Snowflake.Databases(
    "xyz.snowflakecomputing.com",
    "DEMO_WH"
  ),
  AWORKS_Database = Source
    {
      [
        Name = "AWORKS",
        Kind = "Database"
      ]
    }
    [Data],
  PUBLIC_Schema = AWORKS_Database
    {[Name = "PUBLIC", Kind = "Schema"]}
    [Data],
  DIMDATE_Table = PUBLIC_Schema
    {[Name = "DIMDATE", Kind = "Table"]}
    [Data],
  #"Filtered Rows" =
    if DayParam = null then
      DIMDATE_Table
    else if Type.Is(
      Value.Type(DayParam),
      List.Type
    )
    then
      if DayParam = {"__SelectAll__"} then
        DIMDATE_Table
      else
        Table.SelectRows(
          DIMDATE_Table,
          each List.Contains(
            DayParam,
            [ENGLISHDAYNAMEOFWEEK]
          )
        )
    else
      Table.SelectRows(
        DIMDATE_Table,
        each [ENGLISHDAYNAMEOFWEEK]
          = DayParam
      )
in
  #"Filtered Rows"

The #”Filtered Rows” step does the following:

  • If nothing is selected in the slicer then the DayParam parameter will contain a null value, and in this case no filter will be applied to the DimDate table
  • If the DayParam parameter is of type list (when there is a multi-select in the slicer) then
    • If it’s a list containing a single text value of “__SelectAll__”, the default value returned when the slicer has “Select all” selected, then no filter will be applied to the DimDate table
    • Otherwise List.Contains will be used to filter the DimDate table by all the selected values
  • Otherwise the DayParam parameter will contain a single scalar value (because a single value has been selected in the slicer) and the DimDate table will be filtered by that value

Here’s how it works:

Multi-Value Parameters In Power Query Online

There’s a nice new feature in Power Query Online (the version of Power Query used in Dataflows): parameters of type List. You can see this as a new option in the Type dropdown in the “Manage parameters” dialog:

Why is this interesting? In the past, Power Query parameters were always single values like a date or a string; now a parameter can contain mutliple values.

There’s one other new feature in Power Query Online that goes along with this: In and Not In filters, which can use these new List parameters.

I’m sure there are other cool things you can do with this but I’ll leave them to future blog posts.

Power BI Report Accessibility: Emulate Vision Deficiencies Using Edge DevTools

I’m not an expert on Power BI report accessibility like Meagan Longoria but I do know how important an issue accessibility is. I also know how difficult it can be to remember to check for accessibility issues when building reports which is why, when I was watching this video on new features in Edge DevTools, I was pleased to see that Edge now makes it easy to see how your report looks like when viewed by someone with vision deficiencies.

Full documentation is here but since this is a feature aimed at web developers rather than the needs of a Power BI developer, here’s a quick summary. All you need to do is open your report in Edge, hit Ctrl-Shift-I to open DevTools, go to the Rendering pane (you can find it by clicking on the chevron icon in the top menu):

The Edge DevTools menu and how to find the Rendering pane

…and then going to the “Emulate vision deficiencies” menu:

The "Emulate vision deficiencies" menu

Here’s a demo of what it does:

Animated gif showing how the "Emulate vision deficiencies" menu affects a Power BI report
Of course this isn’t the only thing you should be checking for regarding accessibility (see here for a more comprehensive list, for example) but having this feature built into Edge does remove a lot of the friction around making these checks.
%d bloggers like this: