Forecasting With Power BI Data In Excel Using Cube Functions And The FORECAST.ETS Function, Part 1

One of the reasons I love using the Excel cube functions to get Power BI data into Excel instead of PivotTables is because they make it much easier to use other native Excel features and functions in combination with your data. One Excel feature I’ve always been curious about is the FORECAST.ETS function, which allows you to forecast values from time series data. Here’s what the docs have to say about this function:

[This function] calculates or predicts a future value based on existing (historical) values by using the AAA version of the Exponential Smoothing (ETS) algorithm. The predicted value is a continuation of the historical values in the specified target date, which should be a continuation of the timeline. You can use this function to predict future sales, inventory requirements, or consumer trends.

Recently I started playing around with this function to see how it could be used with cube functions and since I learned so many interesting things I thought it would make a good series of blog posts. In this post I’ll look at a basic scenario and in my next post I’ll show how to use dynamic arrays (including one of the newly-released functions) to build something more complex. For these posts I’ll use a dataset containing Land Registry Price Paid data as my source; the exact contents don’t matter much but it contains a Date dimension table, a fact table containing one row for each property (or “real estate” for you Americans) sale in England and Wales in 2018 and a measure called Count Of Sales that returns the aggregated number of property sales.

Let’s start with a worksheet containing cube functions that show the Count of Sales measure broken down by date for the first seven days of 2018:

Here are the underlying formulas:

How can we use FORECAST.ETS on this data? Well, to keep things super-simple I manually entered the next three dates (ie the 8th, 9th and 10th of January) in cells B10, B11 and B12 like so:

The question is, how can we add forecast values to the range C10:C12? The first thing I tried was adding the following formula to C10:

=FORECAST.ETS($B10, $C$3:$C$9, $B$3:$B$9)

It didn’t work though…

The problem here is that the dates don’t come through from Power BI as Excel dates and the values don’t come through from Power BI as numeric values. This can be fixed by using the VALUE and DATEVALUE Excel functions like so:

=FORECAST.ETS($B10, VALUE($C$3:$C$9), DATEVALUE($B$3:$B$9))

Which can then be dragged down to C11 and C12 too:

We now have something working but there are still a few more problems to solve:

  • The forecasts come back as decimal values, which don’t make sense for this measure. This can be fixed by the use of the INT function, for example, to round the values to integers.
  • From looking at the data I can see there is a clear weekly pattern in the sales – very few properties are sold at weekends, whereas Fridays are the most popular day to sell – and this can be added as a hint to the function by setting the Seasonality argument of FORECAST.ETS to 7.
  • Depending on how you slice the data, some dates may return a blank value for Count Of Sales and if there are too many then FORECAST.ETS may not be able to return a value and return a #VALUE error instead. I had hoped setting the Data Completion argument of FORECAST.ETS to 1 would fix this but it didn’t; one solution is to alter the formulas in C3:C9 to return 0 when CUBEVALUE returns a blank value. The standard way to do this is to use ISNUMBER function as detailed on this thread, but I realised this was a great opportunity to use the new(ish) LET function to do things more elegantly.

Here’s the final version of the forecast formula in C10 with these changes:

=INT(FORECAST.ETS($B10, VALUE($C$3:$C$9), DATEVALUE($B$3:$B$9),7))

Here’s the new version of the formula in C3 as an example of this (with a newly added slicer reference too):

=LET(
CubeValueResult,
CUBEVALUE("Price Paid",$B3,C$2, Slicer_Property_Transactions1),
IF(ISNUMBER(CubeValueResult), CubeValueResult, 0)
)

That’s the simple scenario working then. But wouldn’t it be cool if we could vary the number of days of source data or the number of days forecast and put everything together into one big dynamic array so it can be charted easily? I’ll show you how in my next post!

New Options For The Table.Buffer Function In Power Query

The March 2022 release of Power BI Desktop includes some new options for the Table.Buffer M function that can be set in a new second parameter. The options are of type BufferMode.Type; if you look at the built-in documentation for this type you’ll see the following:

The two allowed values are:

  • BufferMode.Eager: The entire value is immediately buffered in memory before continuing
  • BufferMode.Delayed: The type of the value is computed immediately but its contents are not buffered until data is needed, at which point the entire value is immediately buffered

Anyone with previous experience of Table.Buffer will see that BufferMode.Eager is the existing behaviour, but what is BufferMode.Delayed for?

It turns out that it’s there to make development faster. Consider the following M query that loads data from a CSV file with seven columns and a million rows in, and then uses Table.Buffer to buffer that table into memory:

let
  Source = Csv.Document(
    File.Contents(
      "C:\NumbersMoreColumns.csv"
    ),
    [
      Delimiter  = ",",
      Columns    = 7,
      Encoding   = 65001,
      QuoteStyle = QuoteStyle.None
    ]
  ),
  #"Promoted Headers"
    = Table.PromoteHeaders(
    Source,
    [PromoteAllScalars = true]
  ),
  #"Changed Type"
    = Table.TransformColumnTypes(
    #"Promoted Headers",
    {
      {"A", Int64.Type},
      {"B", Int64.Type},
      {"C", Int64.Type},
      {"D", Int64.Type},
      {"E", Int64.Type},
      {"F", Int64.Type},
      {"G", Int64.Type}
    }
  ),
  BufferTable = Table.Buffer(
    #"Changed Type"
  )
in
  BufferTable

When I refresh this query in Power BI Desktop on my PC I see the “Evaluating…” message for 20 seconds before the data starts to load:

If, however, I add the second parameter [BufferMode = BufferMode.Delayed] to Table.Buffer like so:

let
  Source = Csv.Document(
    File.Contents(
      "C:\NumbersMoreColumns.csv"
    ), 
    [
      Delimiter  = ",", 
      Columns    = 7, 
      Encoding   = 65001, 
      QuoteStyle = QuoteStyle.None
    ]
  ), 
  #"Promoted Headers"
    = Table.PromoteHeaders(
    Source, 
    [PromoteAllScalars = true]
  ), 
  #"Changed Type"
    = Table.TransformColumnTypes(
    #"Promoted Headers", 
    {
      {"A", Int64.Type}, 
      {"B", Int64.Type}, 
      {"C", Int64.Type}, 
      {"D", Int64.Type}, 
      {"E", Int64.Type}, 
      {"F", Int64.Type}, 
      {"G", Int64.Type}
    }
  ), 
  BufferTable = Table.Buffer(
    #"Changed Type", 
    [BufferMode = BufferMode.Delayed]
  )
in
  BufferTable

Then, when I run my refresh, the “Evaluating…” message only appears very briefly before the data starts to load:

It’s important to stress that after the “Evaluating…” phase the data load takes exactly the same amount of time – it’s only the “Evaluating…” phase that is faster. This can save you a lot of time as a developer, nonetheless. I have been told when these options are available in dataflows they will make validation (which occurs when you close Power Query Online after editing a dataflow) much faster too – in fact this developed to partially solve the dataflow validation problem.

[Thanks to Curt Hagenlocher for this information]

Update: something I should make clear is that this functionality is only useful for people who are already using Table.Buffer in their queries. If you’re not using Table.Buffer already then these changes won’t be of any benefit or interest.

The ExtAuth Trace Event In Power BI

When you’re looking at the Log Analytics data for your Power BI Premium dataset, or studying a Profiler trace, you may see a command called ExtAuth in the OperationDetailName column:

It is documented here as “An internal service-generated command to perform authentication” – and as such it’s not something you have any control over. It may be triggered when a report is opened or a dataset refresh is started.

However you may wonder why it’s so slow sometimes. This is because datasets in the Power BI Service are not held in memory 100% of the time (they are sometimes paged out to disk) although they do need to be in memory before they can be queried or refreshed. When a dataset is not held in memory but the Power BI Service needs to connect to it, the dataset is loaded into the Analysis Services engine and into memory and this can take some time, especially if the dataset is large. The ExtAuth event is often the event that triggers the load of a dataset into memory and therefore the time required to load the dataset into memory is associated with it.

[Thanks to Akshai Mirchandani, Kay Unkroth and Rui Romano for the information in this post]

Why Not All SQL Queries Can Be Used As A Source In Power BI DirectQuery Mode

A few days ago I was tagged on a thread on Twitter: my old pal Mim was upset to find that he couldn’t use a SQL query with a Common Table Expression as the source of a table in DirectQuery mode. He worked out why without my help but I thought that it was worth me writing up an explanation for other people who might be struggling with the same problem.

In Power BI DirectQuery mode you can – for most data sources – use your own SQL query as the source for a table in your dataset. For example when connecting to the AdventureWorksDW 2017 sample database in SQL Server using DirectQuery mode, I can use the following SQL query as the source of a table:

SELECT 
FullDateAlternateKey, 
EnglishDayNameOfWeek, 
EnglishMonthName
FROM DimDate

Now, let’s say I use this table in a dataset and build the following report with two slicers and a table, and with a month and a day name selected in the slicers:

The SQL generated by Power BI to populate the table looks like this:

SELECT 
TOP (501) [t0].[FullDateAlternateKey]
FROM 
(
(
SELECT 
FullDateAlternateKey, 
EnglishDayNameOfWeek, 
EnglishMonthName
FROM DimDate
)
)
 AS [t0]
WHERE 
(
(
[t0].[EnglishDayNameOfWeek] = N'Sunday'
)
 AND 
(
[t0].[EnglishMonthName] = N'April'
)
)

GROUP BY [t0].[FullDateAlternateKey],[t0].[EnglishDayNameOfWeek],[t0].[EnglishMonthName]
ORDER BY [t0].[EnglishDayNameOfWeek]
ASC
,[t0].[EnglishMonthName]
ASC
,[t0].[FullDateAlternateKey]
ASC

Notice how the original SQL query I used as the source of the table is used as a subquery in this query. This is how Power BI generates its SQL and this is why there are limitations on the SQL you can use in DirectQuery mode: not all SQL queries that run standalone can be used in subqueries.

Take the following variation on the SQL above for example:

SELECT
FullDateAlternateKey,
EnglishDayNameOfWeek,
EnglishMonthName
FROM DimDate
ORDER BY
EnglishMonthName

It’s the same query as before but with an ORDER BY clause. It works fine in the Power Query Editor but when you go to your report you’ll see the following error:

“Microsoft SQL: The ORDER BY clause is invalid in views, inline functions, derived tables, subqueries and common table expressions unless TOP, OFFSET or FOR XML is also specified”

This is the error message generated by the SQL query that Power BI is trying to run – and as you will have guessed by now, it’s being generated because Power BI is trying to use your query with an ORDER BY clause inside its own subquery. You can’t put an ORDER BY clause in a subquery in TSQL except in the scenarios mentioned in the error message. The same goes for Common Table Expressions.

What can you do to work around this? There is really just one answer: model your data the way Power BI likes it (ie as a star schema) before it gets to Power BI. It’s the now-famous Roche’s Maxim. Even if you aren’t running into this limitation using SQL queries as the source of a table is a bad idea because it makes maintenance difficult and you end up doing any expensive transformations at query time and paying the performance penalty over and over again. And yes, I know, some of you don’t have permissions to create views or get the data modelled correctly in the source and I know it’s tough, but that’s the way it is.

Refreshing Excel Power Query Queries With VBA And Power Automate For Desktop

At the end of last year two new, fairly minor, improvements were made to Excel VBA’s support for Power Query: you can now refresh and delete individual Power Query queries. These methods are available now for everyone using the Insiders build of Excel. Not particularly interesting on its own (I wrote a post a few years ago about Excel VBA and Power Query if you’re interested) but it got me thinking about this subject again. Also, at the end of last year, I upgraded to Windows 11 which has Power Automate for desktop built in. Power Automate desktop makes it super easy to automate everyday tasks and it has great support for Excel – including the ability to run Excel VBA macros. So I wondered: can you use Power Automate for desktop to automatically refresh your Excel Power Query queries? Yes, you can!

Here’s a simple example. I created a Power Query query called GetTheDateAndTime that returns the current date and time in a table with one row and column. Here’s the M code for the query:

#table(type table [RunDate=datetime], {{DateTime.FixedLocalNow()}})

Next, I created a VBA macro called RunPQQuery to refresh this Power Query query using the new refresh method I mentioned earlier:

Sub RunPQQuery()
    ActiveWorkbook.Queries("GetTheDateAndTime").Refresh
End Sub

I then saved the Excel workbook as a .xlsm file.

Next I opened Power Automate for desktop and created a new desktop flow following the instructions here to open Excel, run the macro and close Excel again. I realised that if I closed Excel immediately after running the macro it would close Excel before the Power Query query had finished, so I added a delay of ten seconds after running the macro to give it time to finish. There are probably more sophisticated ways to solve this problem: for example you could read the value of a cell in the table returned by the query that you knew would change, then after running the query loop until the value you’ve read has changed. Here’s my desktop flow:

Finally I created a cloud flow to run this desktop flow:

And that’s it! A very simple example but very easy to implement.

TSQL Table-Valued Functions And Dynamic M Parameters In Power BI DirectQuery

My favourite – and it seems many other people’s favourite – new feature in the February 2022 Power BI Desktop release is support for more datasources (including SQL Server, Azure SQL DB and Synapse) with dynamic M parameters. In my opinion dynamic M parameters are extremely important for anyone planning to use DirectQuery: they give you a lot more control over the SQL that is generated by Power BI and therefore give you a lot more control over query performance.

Teo Lachev has already stolen my thunder and blogged about how the new functionality allows you to use a TSQL stored procedure as the source of a table in DirectQuery mode. In this post I’m going to show you something very similar – but instead of using a stored procedure, I’m going to show a simple example of how to use a TSQL table-valued function, which I think has a slight advantage in terms of ease-of-use.

To start off I created a table-valued function in the Adventure Works 2017 sample database on my local SQL Server which returns some filtered data from the DimDate table:

CREATE FUNCTION [dbo].[udfDates] (
    @day_name varchar(50),
	@month_name varchar(50)
)
RETURNS TABLE
AS
RETURN
    SELECT 
        FullDateAlternateKey, EnglishDayNameOfWeek, EnglishMonthName, CalendarYear
    FROM
        DimDate
    WHERE
        EnglishDayNameOfWeek=@day_name
		and
		EnglishMonthName=@month_name;

Here’s how it can be called in a SQL SELECT statement:

select 
	FullDateAlternateKey,
    EnglishDayNameOfWeek,
    EnglishMonthName,
    CalendarYear
from 
	udfDates('Thursday', 'February')
where
	CalendarYear=2005

Next, I created a Power Query query in my dataset that called this function and returned a table in DirectQuery mode. The great thing about table-valued functions is that they appear in the Navigator pane when you connect to a SQL Server database from the Power Query Editor:

You can only hard-code the values you pass to the function’s parameters at this point but even if you do nothing here you can just return an empty table. After having done this I selected DirectQuery mode:

After this I created two Power Query text parameters, called DayName and MonthName, to hold the month and day names to be passed to the function:

I also created two Import mode queries called DayNames and MonthNames to hold all the valid values for the DayName and MonthName parameters:

The last thing to do in the Power Query Editor was to edit the query calling the function to pass the M parameters to it. Here’s the M code for the query after the modification made to the “Invoked Functiondbo_udfDates1” step:

let
  Source = Sql.Database(
    "localhost", 
    "AdventureWorksDW2017"
  ), 
  dbo_udfDates = Source
    {
      [
        Schema = "dbo", 
        Item   = "udfDates"
      ]
    }
    [Data], 
  #"Invoked Functiondbo_udfDates1"
    = dbo_udfDates(DayName, MonthName)
in
  #"Invoked Functiondbo_udfDates1"

At this point I closed the Power Query Editor and loaded the three tables to my dataset:

Next I bound the EnglishDayNameOfWeek column on the DayNames table to the DayName M parameter and the EnglishMonthName column on the MonthNames table to the MonthName M parameter:

Finally, I built a report with two slicers bound to the two dynamic M parameter columns and a table showing the output of the table-valued function:

Here’s an example of the TSQL generated by Power BI to populate the table in this report:


SELECT 
TOP (501) 
[t0].[FullDateAlternateKey],
[t0].[EnglishDayNameOfWeek],
[t0].[EnglishMonthName]
FROM 
(
(
select [$Table].[FullDateAlternateKey],
    [$Table].[EnglishDayNameOfWeek],
    [$Table].[EnglishMonthName],
    [$Table].[CalendarYear]
from [dbo].[udfDates]('Thursday', 'February') as [$Table]
)
)
 AS [t0]
WHERE 
(
[t0].[CalendarYear] = 2010
)

GROUP BY 
[t0].[FullDateAlternateKey],[t0].[EnglishDayNameOfWeek],
[t0].[EnglishMonthName],[t0].[CalendarYear]
ORDER BY [t0].[FullDateAlternateKey]
ASC
,[t0].[EnglishMonthName]
ASC
,[t0].[CalendarYear]
ASC
,[t0].[EnglishDayNameOfWeek]
ASC
 

I am by no means an expert in writing efficient TSQL so I can’t comment on the pros and cons of table-valued functions, stored procedures or using native SQL queries in Power BI (although the last of these has obvious maintainability issues). Hopefully, though, you can see the possibilities – and if you do get round to using this approach on a project, please let me know how you get on!

Speed Up Power BI Refresh By Increasing The Amount Of Memory On Your On-Premises Data Gateway Machine

If your Power BI dataset needs to connect to an on-premises data source it will need to connect via an On-Premises Data Gateway; what’s more, if you have a Power Query query that combines data from cloud and on-premises data sources, then Power BI needs to connect to all data sources used (even if they are cloud sources) via an On-Premises Data Gateway. And when Power BI connects to a data source via a gateway all the transformation work done by the Power Query engine takes place on the machine where the gateway is installed.

As a result of all this the specification of the machine where the gateway is installed has an impact on the performance of any dataset refreshes that use it. So how powerful does the machine with the gateway installed on it need to be? That’s a tough question because, as you can probably guess, it depends on a lot of different factors: how many datasets get refreshed in parallel, how often, how complex the transformations used are, if you’re using Import mode or DirectQuery, and so on. There’s a great docs article describing how to go about sizing your gateway machine here. Unsurprisingly, the more memory and CPU cores you have available the better refresh performance is likely to be and the more refreshes can take place in parallel.

There is one important thing to point out that is not obvious though: increasing the amount of memory on your gateway machine can improve refresh performance even if it doesn’t look like the machine is under memory or CPU pressure. This is because the total amount of memory made available for a single refresh is calculated relative to the overall amount of memory available on the gateway machine. I’ve written about how the Power Query engine uses memory a few times: this post describes how each refresh can use a fixed maximum amount of memory and how performance suffers if your refresh needs to use more; this post shows how increasing the amount of memory Power Query can use for a single refresh can increase refresh performance dramatically. In short, the more memory on your gateway machine the more memory is available for each individual refresh and – if the Power Query engine needs it, for example if you’re sorting, merging, pivoting/unpivoting, buffering or doing group-bys on large tables and query folding is not taking place – the faster each refresh will be.

You do have the option of changing some properties (listed here) on the gateway to try to influence this behaviour. However since the algorithms involved are not documented and may change at any time, not all the relevant properties are documented, and working out what the optimal settings are yourself is very difficult, I don’t recommend doing this. It’s a lot easier just to increase the amount of memory and CPU on the gateway machine and let the gateway work out how these resources should be used. I’m not saying that you should blindly increase your memory and CPU as much as you can, though – you should test to see what the impact on refresh performance is (the gateway logs will be useful here) and whether that impact is worth the extra cost.

Bonus tip: another easy way to improve gateway refresh performance is to enable the StreamBeforeRequestCompletes property on the gateway. It’s documented here and a few people (see here and here for example) have already blogged about how much this has helped them.

Expanding Azure Data Explorer Dynamic Columns In Power Query

Azure Data Explorer has a data type called dynamic which can be used to hold scalar values as well as arrays and property bags; you can read about it in the docs here. For example (following on from my recent series on DirectQuery on Log Analytics, starting here) the ApplicationContext column in the PowerBIDatasetsWorkspace table that holds the IDs of the dataset, report and visual that generated a DAX query (see this post for more background) is of type dynamic:

This is what the contents of the column look like:

Now you can easily extract the individual property values from this column in KQL, and indeed I did so in the KQL queries in this post, but the interesting thing is you can also extract these values in Power Query M very easily and – crucially – maintain query folding using the Record.FieldOrDefault M function in a custom column without needing to write any KQL yourself, in both Import mode and DirectQuery mode.

In order to do this, first of all you have to enter a table name or KQL query in the third parameter of AzureDataExplorer.Contents function. When you do this you can treat a dynamic column as a record even if the Power Query UI doesn’t display it as such. Here’s an example M query that shows all of this in action on the PowerBIDatasetsWorkspace table that contains Power BI data in Log Analytics:

let
  Source = AzureDataExplorer.Contents(
    "InsertClusterName", 
    "InsertDBName", 
    "PowerBIDatasetsWorkspace", 
    [
      MaxRows                 = null, 
      MaxSize                 = null, 
      NoTruncate              = null, 
      AdditionalSetStatements = null
    ]
  ), 
  #"Added Custom" = Table.AddColumn(
    Source, 
    "Custom", 
    each Record.FieldOrDefault(
      [ApplicationContext], 
      "DatasetId"
    )
  )
in
  #"Added Custom"

From this query, here’s what the Custom Column dialog for the #”Added Custom” step looks like:

Here’s the output in the Power Query Editor:

And here’s the KQL query that this M query folds to (taken from the View Native Query dialog in the Power Query Editor):

PowerBIDatasetsWorkspace
| extend ["Custom"]=["ApplicationContext"]["DatasetId"]

I admit that this is a super-obscure tip but I think it’s fascinating nonetheless, especially given how nested structures are becoming more and more common in the world of big data. It would be great to have similar behaviour in other connectors…

Thanks to my colleague Itay Sagui (whose blog has several posts on Power BI/Azure Data Explorer integration) for this information.

Monitor Power BI Queries And Refreshes With DirectQuery On Log Analytics, Part 4: Query Sessions

In my last post I showed how to use Log Analytics data to analyse Power BI query activity. The problem with looking at a long list of queries, though, is that it can be overwhelming and it can be hard to get a sense of when users were and weren’t actively interacting with a report. In this post I’ll show you how you can write a KQL query that gives you a summary view that solves this problem by grouping queries into sessions.

What is a session? I’m going to define it as a group of DAX queries – sessions – run by the same user that occur within 90 seconds of each other. If a user opens a Power BI report, clicks on a slicer or filter, changes a page or whatever, then so long as each of the DAX queries that are generated in the background when they do this have end times that are no more than 90 seconds apart, then that will count as a single session.

The key to solving this problem is the KQL scan operator. If, like me, you’re the kind of geek that likes data analysis you’re going to love the scan operator! There’s a great blog post explaining what it does here; in summary it allows you to do process mining, ie find sequences of events that match a given pattern. I can think of a few cool things I could do with it but grouping DAX queries into sessions is fairly straightforward. Here’s my KQL query:

PowerBIDatasetsWorkspace
| where TimeGenerated > ago(4h)
| where isempty(ExecutingUser)==false 
and ExecutingUser<>"Power BI Service"
and OperationName=="QueryEnd"
| project OperationName, OperationDetailName, TimeGenerated, 
EventText, DurationMs, ExecutingUser
| partition by ExecutingUser
(
sort by TimeGenerated asc
| scan declare (SessionStartTime:datetime) with
(
    step x: true => 
        SessionStartTime = 
            iif(isempty(x.SessionStartTime),
                TimeGenerated, 
                iif((TimeGenerated-x.TimeGenerated)>90s, 
                    TimeGenerated,x.SessionStartTime));
)
)
| summarize 
    SessionEndTime=max(TimeGenerated),
    QueryCount=count() 
    by ExecutingUser, SessionStartTime
| extend SessionLength = SessionEndTime - SessionStartTime
| sort by SessionStartTime asc

This is how the query works:

  • Filters the Log Analytics data down to the queries run in the last four hours which were run by a real user and not the Power BI Service
  • Splits this data into separate tables (using the KQL partition operator) for each user
  • Sorts these tables by the TimeGenerated column
  • For each of these tables, add a new column called SessionStartTime that contains the value from the TimeGenerated column and copies it down for each subsequent query if its start time is less than 90 seconds; if a query starts more than 90 seconds after the previous one then SessionStartTime contains that query’s TimeGenerated value again
  • Uses the summarize operator to group the data by the values in SessionStartTime and ExecutingUser

Here’s the output:

This query returns one row per user session where:

  • SessionStartTime is the end time of the first DAX query run in the session
  • SessionEnd is the end time of the last DAX query run in the session
  • QueryCount is the number of DAX queries run in the session
  • SessionLength is the duration of the session

Let me know if you have any ideas for modifying or improving this…

Monitor Power BI Queries And Refreshes With DirectQuery On Log Analytics, Part 3: Queries

In the first part of this series I showed how to create a Power BI DirectQuery dataset connected to Log Analytics to do near real-time monitoring of Analysis Services activity; in the second part I showed how to use this technique to monitor dataset refreshes in a workspace. In this blog post I’ll discuss how you can use this technique to monitor queries on your datasets.

As with the post on refreshes, the first and most important question to ask is: why do you need a DirectQuery dataset to monitor query activity? In all Power BI projects Import mode should be used where possible because it gives you better performance and more functionality. If you want to do long-term analysis of query activity then you’re better off importing the data from Log Analytics rather than using DirectQuery. So when and why would you use DirectQuery to monitory query activity in Power BI? The answer has to be when you need to be able to see what queries have been run recently, in the last hour or two, so when someone calls up to complain that their queries are slow you can see what queries they have just run and try and work out why they are slow.

It’s actually very easy to build a simple KQL query to look at query activity on your datasets: you just need to look at the QueryEnd event (or operation, as its called in Log Analytics), which is fired when a query finishes running. This event gives you all the information you need: the type of query (DAX or MDX), the duration, the CPU time, the query text and so on. The main challenge is that while you also get the IDs of the report and visual that generated the query, you don’t get the names of the report or visual. I wrote about how to get a list of visual and report IDs here and here, but how can you use that information?

Here’s an example KQL query that does just that:

let Reports = 
datatable (ReportId:string, ReportName:string)
[
    "Insert Report ID here", 
    "My Test Report"
];
let Visuals=
externaldata(VisualId: string,VisualTitle: string,VisualType: string)
[
    "Insert Shared Access Signature To CSV File Here"
]
with (format="csv", ignoreFirstRecord=true);
PowerBIDatasetsWorkspace | 
where TimeGenerated > ago(2h) |
where OperationName == 'QueryEnd' |
extend a = todynamic(ApplicationContext)|
extend VisualId = tostring(a.Sources[0].VisualId), 
ReportId = tostring(a.Sources[0].ReportId), 
Operation = coalesce(tostring(a.Sources[0].Operation), "Query"),
DatasetName = ArtifactName, Query = EventText |
lookup kind=leftouter Visuals on VisualId |
lookup kind=leftouter Reports on ReportId |
project 
ExecutingUser, TimeGenerated, WorkspaceName, DatasetName, 
Operation, OperationDetailName, 
VisualTitle, VisualType,
Status, DurationMs, CpuTimeMs, Query |
order by TimeGenerated desc 

To extract the report and visual IDs from the JSON in the ApplicationContext column I’ve used the todynamic() function and then split them out into separate columns. Then, in order to have tables to do the lookups, I used two different techniques. I used the datatable operator to create a hard-coded table with the report IDs and names; in this case I’ve only got one report, but even if you were doing this in the real world at the workspace level I think it’s feasible to create hard-coded tables with report IDs and names in because there wouldn’t be that many of them. For the visuals I saved a table with the visual IDs, names and types in a CSV file and saved it to ADLSgen2 storage, then used the externaldata() (with a shared access signature in this case just to make authentication simple) operator to read this data into a table used for the lookup.

Here’s the output:

It’s then easy to use this table in a Power BI DirectQuery dataset to build a report that shows you what queries have been run recently. This report looks a bit rubbish and I’m sure you can build something that looks a lot nicer:

%d bloggers like this: