When you’re looking at the Log Analytics data for your Power BI Premium dataset, or studying a Profiler trace, you may see a command called ExtAuth in the OperationDetailName column:
It is documented here as “An internal service-generated command to perform authentication” – and as such it’s not something you have any control over. It may be triggered when a report is opened or a dataset refresh is started.
However you may wonder why it’s so slow sometimes. This is because datasets in the Power BI Service are not held in memory 100% of the time (they are sometimes paged out to disk) although they do need to be in memory before they can be queried or refreshed. When a dataset is not held in memory but the Power BI Service needs to connect to it, the dataset is loaded into the Analysis Services engine and into memory and this can take some time, especially if the dataset is large. The ExtAuth event is often the event that triggers the load of a dataset into memory and therefore the time required to load the dataset into memory is associated with it.
[Thanks to Akshai Mirchandani, Kay Unkroth and Rui Romano for the information in this post]
In the first part of this series I showed how to create a Power BI DirectQuery dataset connected to Log Analytics to do near real-time monitoring of Analysis Services activity; in the second part I showed how to use this technique to monitor dataset refreshes in a workspace. In this blog post I’ll discuss how you can use this technique to monitor queries on your datasets.
As with the post on refreshes, the first and most important question to ask is: why do you need a DirectQuery dataset to monitor query activity? In all Power BI projects Import mode should be used where possible because it gives you better performance and more functionality. If you want to do long-term analysis of query activity then you’re better off importing the data from Log Analytics rather than using DirectQuery. So when and why would you use DirectQuery to monitory query activity in Power BI? The answer has to be when you need to be able to see what queries have been run recently, in the last hour or two, so when someone calls up to complain that their queries are slow you can see what queries they have just run and try and work out why they are slow.
It’s actually very easy to build a simple KQL query to look at query activity on your datasets: you just need to look at the QueryEnd event (or operation, as its called in Log Analytics), which is fired when a query finishes running. This event gives you all the information you need: the type of query (DAX or MDX), the duration, the CPU time, the query text and so on. The main challenge is that while you also get the IDs of the report and visual that generated the query, you don’t get the names of the report or visual. I wrote about how to get a list of visual and report IDs here and here, but how can you use that information?
Here’s an example KQL query that does just that:
let Reports =
datatable (ReportId:string, ReportName:string)
"Insert Report ID here",
"My Test Report"
externaldata(VisualId: string,VisualTitle: string,VisualType: string)
"Insert Shared Access Signature To CSV File Here"
with (format="csv", ignoreFirstRecord=true);
where TimeGenerated > ago(2h) |
where OperationName == 'QueryEnd' |
extend a = todynamic(ApplicationContext)|
extend VisualId = tostring(a.Sources.VisualId),
ReportId = tostring(a.Sources.ReportId),
Operation = coalesce(tostring(a.Sources.Operation), "Query"),
DatasetName = ArtifactName, Query = EventText |
lookup kind=leftouter Visuals on VisualId |
lookup kind=leftouter Reports on ReportId |
ExecutingUser, TimeGenerated, WorkspaceName, DatasetName,
Status, DurationMs, CpuTimeMs, Query |
order by TimeGenerated desc
To extract the report and visual IDs from the JSON in the ApplicationContext column I’ve used the todynamic() function and then split them out into separate columns. Then, in order to have tables to do the lookups, I used two different techniques. I used the datatable operator to create a hard-coded table with the report IDs and names; in this case I’ve only got one report, but even if you were doing this in the real world at the workspace level I think it’s feasible to create hard-coded tables with report IDs and names in because there wouldn’t be that many of them. For the visuals I saved a table with the visual IDs, names and types in a CSV file and saved it to ADLSgen2 storage, then used the externaldata() (with a shared access signature in this case just to make authentication simple) operator to read this data into a table used for the lookup.
Here’s the output:
It’s then easy to use this table in a Power BI DirectQuery dataset to build a report that shows you what queries have been run recently. This report looks a bit rubbish and I’m sure you can build something that looks a lot nicer:
In the first post in this series I showed how it was possible to create a DirectQuery dataset connected to Log Analytics so you could analyse Power BI query and refresh activity in near real-time. In this post I’ll take a closer look into how you can use this data to monitor refreshes.
The focus of this series is using DirectQuery on Log Analytics to get up-to-date information (remember there’s already an Import-mode report you can use for long-term analysis of this data), and this influences the design of the dataset and report. If you’re an administrator these are the refresh-related questions you will need answers to:
Which datasets are refreshing right now?
Which recent datasets refreshes succeeded and which ones failed?
If a dataset refresh failed, why did it fail?
How long did these refreshes take?
To answer these questions I built a simple dataset with two tables in. First of all, I built a KQL query that answers these four questions and used it as the source for a DirectQuery table in my dataset. Here’s the query:
//get all refresh events
//this table holds all the Power BI log data for a workspace
//assume that no currently running refresh has run for more than 24 hours
| where TimeGenerated > ago(24h)
//only return the events fired when refreshes begin and end
| where OperationName in ('CommandBegin', 'CommandEnd')
and EventText has_any ('<Refresh xmlns', '\"refresh\"');
//gets just the events fired when the refreshes begin
| where OperationName == 'CommandBegin' ;
//gets just the events fired when the refreshes end
| where OperationName == 'CommandEnd' ;
//do a full outer join between the previous two tables
| join kind=fullouter CommandEnds on XmlaRequestId, EventText
//select just the columns relevant to refreshes
| project ArtifactName, XmlaRequestId,
StartTime = TimeGenerated, EndTime = TimeGenerated1,
DurationMs = DurationMs1, CpuTimeMs = CpuTimeMs1,
Outcome = coalesce(Status1, 'In Progress'),
ErrorCode = StatusCode1,
| summarize StartTime=min(StartTime), EndTime=max(EndTime), DurationMs=max(DurationMs),
by XmlaRequestId, ArtifactName, Outcome, WorkspaceId, WorkspaceName
Here’s an example of the output, a table with one row for each refresh in the last 24 hours:
Here’s how the query works:
Gets all the CommandBegin and CommandEnd events – the events that are fired when a refresh begins and ends – that occurred in the last 24 hours and stores them in two tables.
Only some of the CommandBegin and and CommandEnd events are related to refreshes, so the query filters on the EventText column to find them.
I chose to limit the data to the last 24 hours to keep the amount of data displayed to a minimum and because it’s unlikely that any refresh will take more than 24 hours. If I limited the data to the last hour, for example, then the query would not return the CommandBegin events or the CommandEnd events for any refreshes that started more than an hour ago and were still in progress, so these refreshes would not be visible in the report.
Does a full outer join between the table with the CommandBegin events and the table with the CommandEnd events, so the query returns one row for every that either started or ended in the last 24 hours.
Limits the columns in the results to show:
XmlaRequestId – the unique identifier that links all the events for a specific refresh operation
ArtifactName – the dataset name
WorkspaceId, WorkspaceName – the ID and name of the Power BI workspace where the dataset is stored
StartTime and EndTime – the refresh start time and end time,
DurationMs – the duration of the refresh in milliseconds
CpuTimeMs – the CPU Time consumed by the refresh in milliseconds,
Outcome – whether the refresh succeeded, failed or is still in progress
Errorcode – the error code that tells you why the refresh failed
Next I created a second table to return more detailed information about a specific refresh. Here’s an example of the KQL:
| where TimeGenerated > ago(24h)
| where OperationName in
| where XmlaRequestId in
|project OperationName, OperationDetailName, TimeGenerated,
EventText, XmlaRequestId, DurationMs, CpuTimeMs
This query returns the CommandBegin, CommandEnd, ProgressReportEnd and Error events for a specific refresh (so all these events have the same XmlaRequestId value).
One last complication was that, in my Power BI dataset, I was not able to create a relationship between these two table (I got some kind of query folding error although I don’t know why), so in order to be able to drill from the summary table to the detail table I had to use a dynamic M parameter for the XmlaRequestId used to filter the second table (see this post for more details on how to implement dynamic M parameters when working with KQL queries).
Here’s the report I built on this dataset showing how you can drill from the summary page to the details page for a particular refresh:
Not exactly pretty, but I’m not much of a data visualisation person and I think tables are fine for this type data. You can download a sample .pbix file (I couldn’t make a .pbit file work for some reason…) for this report here and prettify it yourself if you want. All you need to do is change the Cluster and Database parameters in the Power Query Editor to match the details of your Log Analytics workspace and database (see my last post for more details).
[Thanks to Chris Jones at Microsoft for his ideas and suggestions on this topic]
[Update 10th January 2022 – I’ve modified the summary KQL query based on feedback from another Microsoft colleague, Xiaodong Zhang, who pointed out that it didn’t work for TMSL scripts or datasets that use incremental refresh]
As a Power BI administrator you want to see what’s happening in your tenant right now: who’s running queries, which datasets are refreshing and so on. That way if a user calls you to complain that their report is slow or their dataset hasn’t refreshed yet you can start troubleshooting immediately. Power BI’s integration with Log Analytics (currently in preview with some limitations) is a great source of information for this kind of troubleshooting: it gives you the ability to send various useful Analysis Services engine events, events that give you detailed information about queries and refreshes among other things, to Log Analytics with a latency of only a few minutes. Once you’ve done that you can write KQL queries to understand what’s going on, but writing queries is time consuming – what you want, of course, is a Power BI report.
The dataset behind that Power BI report will need to be in DirectQuery mode if you want to see the most up-to-date data; unfortunately the current Log Analytics for Power BI Datasets template app is Import mode, which is only good if you want to look at long-term trends. It’s not obvious how to connect Power BI to Log Analytics in DirectQuery mode but it is possible: some time ago John White wrote a detailed blog post showing how you could use the Azure Data Explorer connector to allow Power BI to connect to Log Analytics in DirectQuery mode. If you read the comments on that post you’ll see that this method stopped working soon afterwards, although there was a workaround using an older version of the connector. The good news is that as of the December 2021 release of Power BI Desktop the bug was fixed and the details in John’s post are correct again. There is one other issue that needs mentioning: at the time of writing it only seems to be possible to create a DirectQuery dataset on Log Analytics if you specify a native KQL query, but since that can be simply the name of the table you want to query in Log Analytics that’s not much of a blocker.
Putting this all together, to be able to build a Power BI DirectQuery dataset and report to analyse query and refresh activity, you need to:
Set up a Log Analytics workspace in the Azure Portal – see here for instructions.
Connect the Power BI Service to this new Log Analytics workspace – see here for instructions – so that query and refresh events are logged to it.
Open up Power BI Desktop and connect to the Log Analytics workspace using the Azure Data Explorer connector. In the connection dialog:
Enter a url pointing to your Log Analytics workspace in the following format in the Cluster box: https://ade.loganalytics.io/subscriptions/enter Azure subscription id here/resourcegroups/enter Azure resource group name here/providers/microsoft.operationalinsights/workspaces/enter Log Analytics workspace name here
Enter your Log Analytics workspace name in the Database box
In the “Table name or Azure Data Explorer query (optional)” box you need to enter a custom KQL query, which can simply be the following table name: PowerBIDatasetsWorkspace
Scroll down to the end of the dialog (which may be off the screen) and select DirectQuery for the Data Connectivity mode.
At this point you’ll have a dataset with a single table in DirectQuery mode that you can build your reports from; you might also want to use automatic page refresh to make sure the report page always shows the latest data. For example, here’s a very simple report showing query durations:
How do you interpret the data in the table in the dataset? Are there any interesting analyses you can do with more complex KQL queries or different ways of modelling the data? All good questions and ones that are out of scope for now. I’ll try to answer these questions in my next few posts though.
The integration between Azure Log Analytics and Power BI opens up a lot of new possibilities for Power BI administrators. It’s still in preview (there’s a lot more cool stuff still to come) but the Analysis Services events that are available at the time of writing are still very useful: they give you pretty much everything you had in Profiler plus some new stuff too. In this post I’ll show you how you can use this data to work out which visual in a published report generated a particular DAX query.
In Log Analytics you can get information about all the queries run against a dataset using a simple KQL query like this one (I haven’t included an explicit time filter in this query, although you would want to do this in real life):
where ArtifactName == "InsertWorkspaceNameHere"
and OperationName == 'QueryEnd'
The QueryEnd event is raised every time a query finishes running – it’s an event you may already be familiar with from Profiler, xEvents or the Log Analytics integration with AAS – and it tells you useful things like the time the query finished, the duration, the text of the DAX query, the user running the query and so on. However what is new for Power BI is the contents ApplicationContext column. My colleague Kasper blogged about how you can use this column to get the DatasetId and ReportId when using AAS here; it now also contains a GUID that identifies the exact visual on a report that generated the DAX query.
Taking a closer look at this column shows that it contains a JSON document:
Here’s a more sophisticated KQL query that parses that JSON document, extracts the contents into separate columns and returns other columns you might need for troubleshooting like the DAX query text and query duration:
where ArtifactName == "InsertWorkspaceNameHere"
and OperationName == 'QueryEnd' |
extend a = todynamic(ApplicationContext)|
extend VisualId = a.Sources.VisualId, ReportId = a.Sources.ReportId,
DatasetName = ArtifactName, DAXQuery = EventText |
project TimeGenerated, WorkspaceName, DatasetName, ReportId, VisualId, DurationMs, DAXQuery |
order by TimeGenerated desc
The ID of the dataset isn’t that interesting because you can get it, and the name of the dataset, in other columns. The ReportId is useful because it tells you which report generated the query and there are a number of ways you can find out which report this relates to. For example, you can just open the report in the browser and look at the url: the Report ID is the GUID between “/reports/” and “/ReportSection” as detailed here.
How can you work out which visual the VisualId relates to though? It’s not obvious, but it is possible. First you have to open your report in Power BI Desktop, open Performance Analyzer, refresh the report page and export the data from Performance Analyzer as a JSON file as detailed here. The data you need is hidden in this JSON file in the Visual Container Lifecycle event; here’s the M code for a Power Query query to get it:
Here’s an example of the output of this Power Query query:
This shows that, on the report in question, there were three visuals: a card visual called “My Card”, a column chart called “My Column Chart” and a table called “My Table”. It also shows the IDs of these visuals which you can match up with the VisualId values in the ApplicationContext column in Log Analytics.
The last problem is matching these names up with the actual visuals in the report and this will only be easy if the report designer has followed the best practice of giving each visual a meaningful name; you can now do this by double-clicking on the visual in the Selection pane or by editing the Title Text property as Reid Havens shows here.