Forecasting With Power BI Data In Excel Using Cube Functions And The FORECAST.ETS Function, Part 1

One of the reasons I love using the Excel cube functions to get Power BI data into Excel instead of PivotTables is because they make it much easier to use other native Excel features and functions in combination with your data. One Excel feature I’ve always been curious about is the FORECAST.ETS function, which allows you to forecast values from time series data. Here’s what the docs have to say about this function:

[This function] calculates or predicts a future value based on existing (historical) values by using the AAA version of the Exponential Smoothing (ETS) algorithm. The predicted value is a continuation of the historical values in the specified target date, which should be a continuation of the timeline. You can use this function to predict future sales, inventory requirements, or consumer trends.

Recently I started playing around with this function to see how it could be used with cube functions and since I learned so many interesting things I thought it would make a good series of blog posts. In this post I’ll look at a basic scenario and in my next post I’ll show how to use dynamic arrays (including one of the newly-released functions) to build something more complex. For these posts I’ll use a dataset containing Land Registry Price Paid data as my source; the exact contents don’t matter much but it contains a Date dimension table, a fact table containing one row for each property (or “real estate” for you Americans) sale in England and Wales in 2018 and a measure called Count Of Sales that returns the aggregated number of property sales.

Let’s start with a worksheet containing cube functions that show the Count of Sales measure broken down by date for the first seven days of 2018:

Here are the underlying formulas:

How can we use FORECAST.ETS on this data? Well, to keep things super-simple I manually entered the next three dates (ie the 8th, 9th and 10th of January) in cells B10, B11 and B12 like so:

The question is, how can we add forecast values to the range C10:C12? The first thing I tried was adding the following formula to C10:

=FORECAST.ETS($B10, $C$3:$C$9, $B$3:$B$9)

It didn’t work though…

The problem here is that the dates don’t come through from Power BI as Excel dates and the values don’t come through from Power BI as numeric values. This can be fixed by using the VALUE and DATEVALUE Excel functions like so:

=FORECAST.ETS($B10, VALUE($C$3:$C$9), DATEVALUE($B$3:$B$9))

Which can then be dragged down to C11 and C12 too:

We now have something working but there are still a few more problems to solve:

  • The forecasts come back as decimal values, which don’t make sense for this measure. This can be fixed by the use of the INT function, for example, to round the values to integers.
  • From looking at the data I can see there is a clear weekly pattern in the sales – very few properties are sold at weekends, whereas Fridays are the most popular day to sell – and this can be added as a hint to the function by setting the Seasonality argument of FORECAST.ETS to 7.
  • Depending on how you slice the data, some dates may return a blank value for Count Of Sales and if there are too many then FORECAST.ETS may not be able to return a value and return a #VALUE error instead. I had hoped setting the Data Completion argument of FORECAST.ETS to 1 would fix this but it didn’t; one solution is to alter the formulas in C3:C9 to return 0 when CUBEVALUE returns a blank value. The standard way to do this is to use ISNUMBER function as detailed on this thread, but I realised this was a great opportunity to use the new(ish) LET function to do things more elegantly.

Here’s the final version of the forecast formula in C10 with these changes:

=INT(FORECAST.ETS($B10, VALUE($C$3:$C$9), DATEVALUE($B$3:$B$9),7))

Here’s the new version of the formula in C3 as an example of this (with a newly added slicer reference too):

=LET(
CubeValueResult,
CUBEVALUE("Price Paid",$B3,C$2, Slicer_Property_Transactions1),
IF(ISNUMBER(CubeValueResult), CubeValueResult, 0)
)

That’s the simple scenario working then. But wouldn’t it be cool if we could vary the number of days of source data or the number of days forecast and put everything together into one big dynamic array so it can be charted easily? I’ll show you how in my next post!

New Options For The Table.Buffer Function In Power Query

The March 2022 release of Power BI Desktop includes some new options for the Table.Buffer M function that can be set in a new second parameter. The options are of type BufferMode.Type; if you look at the built-in documentation for this type you’ll see the following:

The two allowed values are:

  • BufferMode.Eager: The entire value is immediately buffered in memory before continuing
  • BufferMode.Delayed: The type of the value is computed immediately but its contents are not buffered until data is needed, at which point the entire value is immediately buffered

Anyone with previous experience of Table.Buffer will see that BufferMode.Eager is the existing behaviour, but what is BufferMode.Delayed for?

It turns out that it’s there to make development faster. Consider the following M query that loads data from a CSV file with seven columns and a million rows in, and then uses Table.Buffer to buffer that table into memory:

let
  Source = Csv.Document(
    File.Contents(
      "C:\NumbersMoreColumns.csv"
    ),
    [
      Delimiter  = ",",
      Columns    = 7,
      Encoding   = 65001,
      QuoteStyle = QuoteStyle.None
    ]
  ),
  #"Promoted Headers"
    = Table.PromoteHeaders(
    Source,
    [PromoteAllScalars = true]
  ),
  #"Changed Type"
    = Table.TransformColumnTypes(
    #"Promoted Headers",
    {
      {"A", Int64.Type},
      {"B", Int64.Type},
      {"C", Int64.Type},
      {"D", Int64.Type},
      {"E", Int64.Type},
      {"F", Int64.Type},
      {"G", Int64.Type}
    }
  ),
  BufferTable = Table.Buffer(
    #"Changed Type"
  )
in
  BufferTable

When I refresh this query in Power BI Desktop on my PC I see the “Evaluating…” message for 20 seconds before the data starts to load:

If, however, I add the second parameter [BufferMode = BufferMode.Delayed] to Table.Buffer like so:

let
  Source = Csv.Document(
    File.Contents(
      "C:\NumbersMoreColumns.csv"
    ), 
    [
      Delimiter  = ",", 
      Columns    = 7, 
      Encoding   = 65001, 
      QuoteStyle = QuoteStyle.None
    ]
  ), 
  #"Promoted Headers"
    = Table.PromoteHeaders(
    Source, 
    [PromoteAllScalars = true]
  ), 
  #"Changed Type"
    = Table.TransformColumnTypes(
    #"Promoted Headers", 
    {
      {"A", Int64.Type}, 
      {"B", Int64.Type}, 
      {"C", Int64.Type}, 
      {"D", Int64.Type}, 
      {"E", Int64.Type}, 
      {"F", Int64.Type}, 
      {"G", Int64.Type}
    }
  ), 
  BufferTable = Table.Buffer(
    #"Changed Type", 
    [BufferMode = BufferMode.Delayed]
  )
in
  BufferTable

Then, when I run my refresh, the “Evaluating…” message only appears very briefly before the data starts to load:

It’s important to stress that after the “Evaluating…” phase the data load takes exactly the same amount of time – it’s only the “Evaluating…” phase that is faster. This can save you a lot of time as a developer, nonetheless. I have been told when these options are available in dataflows they will make validation (which occurs when you close Power Query Online after editing a dataflow) much faster too – in fact this developed to partially solve the dataflow validation problem.

[Thanks to Curt Hagenlocher for this information]

Update: something I should make clear is that this functionality is only useful for people who are already using Table.Buffer in their queries. If you’re not using Table.Buffer already then these changes won’t be of any benefit or interest.

The ExtAuth Trace Event In Power BI

When you’re looking at the Log Analytics data for your Power BI Premium dataset, or studying a Profiler trace, you may see a command called ExtAuth in the OperationDetailName column:

It is documented here as “An internal service-generated command to perform authentication” – and as such it’s not something you have any control over. It may be triggered when a report is opened or a dataset refresh is started.

However you may wonder why it’s so slow sometimes. This is because datasets in the Power BI Service are not held in memory 100% of the time (they are sometimes paged out to disk) although they do need to be in memory before they can be queried or refreshed. When a dataset is not held in memory but the Power BI Service needs to connect to it, the dataset is loaded into the Analysis Services engine and into memory and this can take some time, especially if the dataset is large. The ExtAuth event is often the event that triggers the load of a dataset into memory and therefore the time required to load the dataset into memory is associated with it.

[Thanks to Akshai Mirchandani, Kay Unkroth and Rui Romano for the information in this post]

Why Not All SQL Queries Can Be Used As A Source In Power BI DirectQuery Mode

A few days ago I was tagged on a thread on Twitter: my old pal Mim was upset to find that he couldn’t use a SQL query with a Common Table Expression as the source of a table in DirectQuery mode. He worked out why without my help but I thought that it was worth me writing up an explanation for other people who might be struggling with the same problem.

In Power BI DirectQuery mode you can – for most data sources – use your own SQL query as the source for a table in your dataset. For example when connecting to the AdventureWorksDW 2017 sample database in SQL Server using DirectQuery mode, I can use the following SQL query as the source of a table:

SELECT 
FullDateAlternateKey, 
EnglishDayNameOfWeek, 
EnglishMonthName
FROM DimDate

Now, let’s say I use this table in a dataset and build the following report with two slicers and a table, and with a month and a day name selected in the slicers:

The SQL generated by Power BI to populate the table looks like this:

SELECT 
TOP (501) [t0].[FullDateAlternateKey]
FROM 
(
(
SELECT 
FullDateAlternateKey, 
EnglishDayNameOfWeek, 
EnglishMonthName
FROM DimDate
)
)
 AS [t0]
WHERE 
(
(
[t0].[EnglishDayNameOfWeek] = N'Sunday'
)
 AND 
(
[t0].[EnglishMonthName] = N'April'
)
)

GROUP BY [t0].[FullDateAlternateKey],[t0].[EnglishDayNameOfWeek],[t0].[EnglishMonthName]
ORDER BY [t0].[EnglishDayNameOfWeek]
ASC
,[t0].[EnglishMonthName]
ASC
,[t0].[FullDateAlternateKey]
ASC

Notice how the original SQL query I used as the source of the table is used as a subquery in this query. This is how Power BI generates its SQL and this is why there are limitations on the SQL you can use in DirectQuery mode: not all SQL queries that run standalone can be used in subqueries.

Take the following variation on the SQL above for example:

SELECT
FullDateAlternateKey,
EnglishDayNameOfWeek,
EnglishMonthName
FROM DimDate
ORDER BY
EnglishMonthName

It’s the same query as before but with an ORDER BY clause. It works fine in the Power Query Editor but when you go to your report you’ll see the following error:

“Microsoft SQL: The ORDER BY clause is invalid in views, inline functions, derived tables, subqueries and common table expressions unless TOP, OFFSET or FOR XML is also specified”

This is the error message generated by the SQL query that Power BI is trying to run – and as you will have guessed by now, it’s being generated because Power BI is trying to use your query with an ORDER BY clause inside its own subquery. You can’t put an ORDER BY clause in a subquery in TSQL except in the scenarios mentioned in the error message. The same goes for Common Table Expressions.

What can you do to work around this? There is really just one answer: model your data the way Power BI likes it (ie as a star schema) before it gets to Power BI. It’s the now-famous Roche’s Maxim. Even if you aren’t running into this limitation using SQL queries as the source of a table is a bad idea because it makes maintenance difficult and you end up doing any expensive transformations at query time and paying the performance penalty over and over again. And yes, I know, some of you don’t have permissions to create views or get the data modelled correctly in the source and I know it’s tough, but that’s the way it is.

%d bloggers like this: