In my last post I showed how to use Log Analytics data to analyse Power BI query activity. The problem with looking at a long list of queries, though, is that it can be overwhelming and it can be hard to get a sense of when users were and weren’t actively interacting with a report. In this post I’ll show you how you can write a KQL query that gives you a summary view that solves this problem by grouping queries into sessions.
What is a session? I’m going to define it as a group of DAX queries – sessions – run by the same user that occur within 90 seconds of each other. If a user opens a Power BI report, clicks on a slicer or filter, changes a page or whatever, then so long as each of the DAX queries that are generated in the background when they do this have end times that are no more than 90 seconds apart, then that will count as a single session.
The key to solving this problem is the KQL scan operator. If, like me, you’re the kind of geek that likes data analysis you’re going to love the scan operator! There’s a great blog post explaining what it does here; in summary it allows you to do process mining, ie find sequences of events that match a given pattern. I can think of a few cool things I could do with it but grouping DAX queries into sessions is fairly straightforward. Here’s my KQL query:
PowerBIDatasetsWorkspace | where TimeGenerated > ago(4h) | where isempty(ExecutingUser)==false and ExecutingUser<>"Power BI Service" and OperationName=="QueryEnd" | project OperationName, OperationDetailName, TimeGenerated, EventText, DurationMs, ExecutingUser | partition by ExecutingUser ( sort by TimeGenerated asc | scan declare (SessionStartTime:datetime) with ( step x: true => SessionStartTime = iif(isempty(x.SessionStartTime), TimeGenerated, iif((TimeGenerated-x.TimeGenerated)>90s, TimeGenerated,x.SessionStartTime)); ) ) | summarize SessionEndTime=max(TimeGenerated), QueryCount=count() by ExecutingUser, SessionStartTime | extend SessionLength = SessionEndTime - SessionStartTime | sort by SessionStartTime asc
This is how the query works:
- Filters the Log Analytics data down to the queries run in the last four hours which were run by a real user and not the Power BI Service
- Splits this data into separate tables (using the KQL partition operator) for each user
- Sorts these tables by the TimeGenerated column
- For each of these tables, add a new column called SessionStartTime that contains the value from the TimeGenerated column and copies it down for each subsequent query if its start time is less than 90 seconds; if a query starts more than 90 seconds after the previous one then SessionStartTime contains that query’s TimeGenerated value again
- Uses the summarize operator to group the data by the values in SessionStartTime and ExecutingUser
Here’s the output:

This query returns one row per user session where:
- SessionStartTime is the end time of the first DAX query run in the session
- SessionEnd is the end time of the last DAX query run in the session
- QueryCount is the number of DAX queries run in the session
- SessionLength is the duration of the session
Let me know if you have any ideas for modifying or improving this…
In my opinion, adding the ReportId would also be necessary. Grouping queries from the same user but different reports may not be that accurate. Depends on the scenario, though.
For this, I took what you did in Part 3 of this series and just added this piece of code to the one in this article:
| extend Sources = parse_json(ApplicationContext).Sources
| mv-expand Sources
| extend ReportId = tostring(parse_json(Sources).ReportId)
Hi, we’re seeing a lot of Query events without Sources in ApplicationContext (only DatasetId). Executing user is all users with access to the workspace. And they happen in the middle of the night. We have no subscriptions and/or auto page refresh in our reports. Do you know what this might be? Can it be related to some query cache being refreshed? There’s a ton of heavy quering going on not initiated by a user.
Got it. Query caching. From https://docs.microsoft.com/en-us/power-bi/connect-data/power-bi-query-caching
“The query cache is refreshed when Power BI performs a dataset refresh.When the query cache is refreshed, Power BI must run queries against the underlying data models to get the latest results. If a large number of datasets have query caching enabled and the Premium/Embedded capacity is under heavy load, some performance degradation may occur during cache refresh. Degradation results from the increased volume of queries being executed.”
Very strange caching pattern. For us this means 3h query CPU time following every dataset refresh. This affects cost allocation a lot for us as we share premium capacity cost globally based on CPU. Wouldn’t it be better to simply apply Cache-Aside/Lazy Loading pattern?