Seventeenth Blog Birthday

Every year, on the anniversary of the first-ever post on this blog, I write a post reflecting on what has happened to me professionally in the past year. In the past this has meant I’ve written about learning some new technology or language (yes, DAX and M were new once), dealing with a business issue back when I had my own company, or more recently adjusting to life as a full-time employee at Microsoft. This year’s new challenge has been becoming a manager for the first time.

One of the great things about working for Microsoft is that, unlike many other companies, you don’t need to become a manager to progress in your career. However, the Power BI CAT team is growing and when I was given the opportunity to manage a part of it I thought, why not? I’m in my late 40s and have never had anyone reporting to me before – in my defence I spent many years working for myself – so I thought it would be good to get out of my comfort zone and try something new. It helped that I have a very supportive manager and that several other people on the team made the transition to management at the same time, which means we’re all learning together. I have particularly enjoyed recruiting new people for my team from inside and outside Microsoft: I have some extremely talented people on my team already, with several more due to start in early 2022.

This doesn’t mean I have left technical things behind though, just that I’m spending less time doing technical things and more time managing other people who do technical things. I think it’s important to stay as technical as I can and to maintain some direct contact with customers in order for me to be an effective manager; again, one of the things I like about Microsoft and my team in particular is that I haven’t had to make a binary choice between being a manager or being technical. That said I have had to accept that, more than before, there are problems I can’t help solve and shiny new things I don’t have time to learn about, and that has been hard.

None of this will affect this blog’s focus on Power BI (I’m certain no-one is interested in my thoughts about management…) or how often I blog but it will accelerate a trend that I suspect has been apparent for the last year or so. The primary motivation for me to blog has always been my own education: writing down information I can’t find anywhere else means it doesn’t get lost and explaining it to other people helps me understand it myself. Now, though, this is pretty much the only reason for me to blog, which means even more of the obscure factoids about Power Query data privacy settings and even less of the click-friendly top ten lists about Power BI/Excel integration type of content. I don’t think many people came here for the introductory tutorials though, did they?

Power BI DirectQuery Query Labels For SQL Server-Related Sources

If you’re using Power BI in DirectQuery mode against a SQL Server-related data source (ie SQL Server on-prem, Azure SQL DB or Synapse) you may have noticed a new feature that was released a few weeks ago: query labels that allow you to link a SQL query to the Power BI visual that generated it.

There’s nothing you need to do to enable it – it happens automatically. Here’s an example of a TSQL query generated by a DirectQuery mode dataset with it in:

You can see that an OPTION clause in the query adds a label that contains three GUIDs that identify the dataset, report and visual that the query was generated for. These are the same GUIDs that you’ll see used for this purpose in other places, for example Log Analytics. This post has more details on how you can work out which datasets, reports and and visuals these GUIDs relate to.

As I said, this only works for SQL Server-related sources at the moment, but if you think this would be useful for other sources (and I’m sure you do…) please leave a comment below!

Simulating Slow Data Sources In Power BI

As a postscript to my series on Power BI refresh timeouts (see part 1, part 2 and part 3) I thought it would be useful to document how I was able to simulate a slow data source in Power BI without using large data volumes or deliberately complex M code.

It’s relatively easy to create an M query that returns a table of data after a given delay. For example, this query returns a table with one column and one row after one hour and ten minutes:

let
  Source = Function.InvokeAfter(
    () => #table({"A"}, {{1}}), 
    #duration(0, 1, 10, 0)
  )
in
  Source

Some notes:

  • I’m using #table to return the table without having to connect to a data source. More details on how to use #table can be found here.
  • The delay is achieved using the Function.InvokeAfter M function, with the amount of time to wait for specified using #duration

A more interesting problem is how to create an M query that, instead of waiting for a given duration and then returning a table immediately, returns the rows of a table one at a time with a delay between each row. Here’s a query that does that, returning ten rows one second at a time:

let
  NumberOfRows = 10,
  DaysToWait = 0,
  HoursToWait = 0,
  MinutesToWait = 0,
  SecondsToWait = 1,
  Source = #table(
    {"A"},
    List.Transform(
      {1 .. NumberOfRows},
      each Function.InvokeAfter(
        () => {1},
        #duration(
          DaysToWait,
          HoursToWait,
          MinutesToWait,
          SecondsToWait
        )
      )
    )
  )
in
  Source

Last of all, to simulate a slow SQL Server data source – not being much good at TSQL at all – I borrowed some code from this thread on Stack Overflow to create a function that returns a scalar value after a specified number of seconds:

CREATE FUNCTION [dbo].[ForceDelay](@seconds int) returns int as 
BEGIN DECLARE @endTime datetime2(0) = DATEADD(SECOND, @seconds, GETDATE()); 
WHILE (GETDATE() < @endTime ) 
SET @endTime = @endTime;
return 1;
END

I showed how to call this function from Power Query using a native SQL query here.

Troubleshooting Power BI Timeouts, Part 3: The External Command Timeout

In the first post in this series I showed how any Power BI dataset refresh started via the Power BI portal or API is limited to 2 hours in Shared capacity and 5 hours in Premium capacity, and how you could work around that by running a refresh via Premium’s XMLA endpoint feature. In the second post in this series I showed how some M functions allow you to set timeouts. However, even if you initiate a refresh via the XMLA endpoint you may still get a timeout error and in this post I’ll discuss another reason why: the External Command Timeout.

This property is a hangover from Analysis Services (you can see it documented here). It represents the amount of time the Analysis Services engine inside Power BI will wait to get data from a data source. How it behaves exactly depends on the data source: it may limit the amount of time it takes to get the first row of data from the source or it may limit the amount of time it takes to get all the rows of data. In Power BI Premium it is set to five hours, which means that no single partition can take more than about five hours to refresh. In the first post in this series I worked around this by creating a dataset with multiple partitions, each of which took about an hour to refresh, but when trying to refresh a dataset with a single partition that takes more than five hours I got the following error when trying to refresh from SQL Server Management Studio through the XMLA Endpoint:

The error message here is:

Timeout expired. The timeout period elapsed prior to completion of the operation.. The exception was raised by the IDbCommand interface.

In this case I saw the same error in the Error event in Profiler:

…and in other cases, when testing a different source, I got a different error in Profiler in the Progress Report Error event:

The message here is:

Error processing partition ‘<pii>SlowCommand-2114bb81-69d3-4fe4-9d54-6b2661b0c497</pii>’ of table ‘<pii>SlowCommand</pii>’ [Internal Error Code: ‘0xc112001a’].

There’s no way to avoid the External Command Timeout. Instead, what you need to do is either change your partitioning strategy so each partition refreshes in under five hours or tune your data source, M code or gateway (if you’re using one) so that data is returned to Power BI faster.

In Shared capacity I believe the External Command Timeout is set to two hours (again, to match the overall refresh timeout) but it’s much less important there because you can’t create partitions manually (the only way a dataset in Shared can be partitioned is by setting up incremental refresh) and there’s no XMLA Endpoint so there’s no way to work around the two hour overall refresh limit anyway.

[Thanks, as always, to Akshai Mirchandani for a lot of the information in this post]

Troubleshooting Power BI Timeouts, Part 2: Timeouts Specified In Power Query Functions

In the first post in this series I showed how the Power BI Service applies a limit on the total amount of time it takes to refresh a dataset in the Power BI Service, except when you initiate your refresh via an XMLA Endpoint. In this post I’ll look at the various timeouts that can be configured in Power Query functions that are used to access data.

Every time a Power BI Import mode dataset connects to a data source it goes through a Power Query query, and inside the code of that Power Query query will be an M function that connects to a specific type of data source. Most – but not all – of these M functions have the option to set timeouts. The M functions that connect to relational databases (for example Sql.Database which is used to connect to SQL Server-related sources, or Odbc.DataSource which is used to connect to ODBC sources) typically allow you to configure two types of timeout:

  • A connection timeout, which specifies how long to wait when trying to open a connection to the data source
  • A command timeout, which specifies how long the query to get data from the source is allowed to run

Some other functions have other timeouts more appropriate to the data source they access: for example Web.Contents and OData.Feed have a Timeout property for the HTTP requests they make behind the scenes. Other functions (for example Excel.Workbook) have no timeout-related properties that you can set at all. You should check the documentation , either online or using the #shared intrinsic variable in the Power Query Editor, to see what options are available and what the default timeouts are.

Here’s a simple example of how to set a timeout when connecting to SQL Server. First of all, I created a scalar function called ForceDelay in TSQL that returns a value of 1 after a specified number of seconds, so the query:

select [dbo].[ForceDelay](10) as MyColumn

…takes 10 seconds to run.

When you connect to SQL Server in Power BI/Power Query you can paste a SQL query in the connection dialog under Advanced Options; when you do so you’ll also see the option to set the command timeout in minutes:

Here’s the M code generated when you use the SQL query above, set the Command timeout to 1 minute and click OK:

let
  Source = Sql.Database(
    "localhost",
    "AdventureWorksDW2017",
    [
      Query
        = "select [dbo].[ForceDelay](10) as MyColumn",
      CommandTimeout = #duration(
        0,
        0,
        1,
        0
      )
    ]
  )
in
  Source

Notice that the CommandTimeout option has been set on the Sql.Database function, and that the value passed to it is a duration of one minute defined using #duration. Since the SQL query takes 10 seconds to run and the timeout is 1 minute, it runs successfully.

However, if you set the CommandTimeout option to 5 seconds like so:

let
  Source = Sql.Database(
    "localhost", 
    "AdventureWorksDW2017", 
    [
      Query
        = "select [dbo].[ForceDelay](10) as MyColumn", 
      CommandTimeout = #duration(
        0, 
        0, 
        0, 
        5
      )
    ]
  )
in
  Source

…then the query will take longer that the timeout and you’ll see the following error in the Power Query Editor:

The error message is:

DataSource.Error: Microsoft SQL: Execution Timeout Expired. The timeout period elapsed prior to completion of the operation or the server is not responding.

If you hit the timeout when refreshing in the main Power BI window you’ll see the same message in the refresh dialog box:

So, as you can see, if you are working with large data volumes or slow queries you will need to be aware of the default timeouts set in the M functions you’re using and alter them if need be.

Troubleshooting Power BI Timeouts, Part 1: The Two/Five Hour Import Refresh Limit

If you’re working with a large Power BI dataset and/or a slow data source in Import mode it can be very frustrating to run into timeout errors after you have already waited a long time for a refresh to finish. There are a number of different types of timeout that you might run into, and in this series I’ll look at a few of them and discuss some of the ways you can work around them.

In this post I’ll look at one of the most commonly-encountered timeouts: the limit on the maximum length of time an Import mode dataset refresh can take. As documented here these limits are:

  • Two hours for an import-mode dataset in Shared capacity
  • Five hours for an import-mode dataset in a Premium or PPU capacity

If you are using Premium you can connect to your dataset using SQL Server Profiler and run a trace when the refresh starts and you’ll see the timeout set as a property of the TMSL command that runs the refresh (18000 seconds = 5 hours):

Here’s an example of the message you will see in the Refresh History dialog if your dataset refresh takes more than five hours in PPU:

Once again you get more details in a Profiler trace. When the refresh times out you’ll see an Error event with long message that starts with the following text:

Timeout expired. The timeout period elapsed prior to completion of the operation.. The exception was raised by the IDbCommand interface. The command has been canceled..

There is a way to work around this limit if you’re using Premium or PPU: it only applies to refreshes (scheduled or manual) that you start from the Power BI portal or via the REST API. If you start your refresh by running a TMSL script via the XMLA Endpoint (for example via SQL Server Management Studio or from PowerShell) then the limit does not apply. This is because you’re executing your own TMSL script rather than having the Power BI Service execute its own refresh command – with a timeout set – like the one shown in the screenshot above.

For example, here’s the same dataset from the screenshot above but refreshed successfully from SQL Server Management Studio (notice the Type column says “Via XMLA Endpoint”) and with a duration of just over five hours:

There are a couple of blog posts out there showing how you can implement a refresh strategy using the XMLA Endpoint; this post from Marc Lelijveld and Paulien van Eijk is a great example of how to do this using Azure Automation and Azure Data Factory.

You should also consider tuning any refresh that takes a long time and it could be that after tuning it you fall well within the two/five hour limit. There are a lot of things to consider when tuning dataset refresh; I did a conference session on this topic last year (you can watch the recording here) covering issues like data modelling, query folding in Power Query and the use of dataflows but it doesn’t cover everything and I have learned a lot even since then. In my opinion one of the most important things you can do to improve refresh performance for very large Import datasets is to partition your fact tables and increase the amount of parallelism, as I describe here. One customer I worked with was able to reduce their refresh time from four hours down to one hour using this technique:

As I said, though, there are many different types of timeout that you may encounter – so even if you refresh via the XMLA Endpoint it may still time out for another reason.

%d bloggers like this: