Finding Power BI Semantic Model Refresh Operations In Gateway Logs

The logs for your on-premises data gateway contain a lot of information that is useful for troubleshooting and performance tuning. The contents of the logs are documented here and there a several great solutions out there for reporting on their contents: there’s a Microsoft-provided template and Rui Romano’s more sophisticated monitoring solution here (there’s also another, even better community solution being worked on that will be publicly available soon). To do any troubleshooting or performance tuning, though, you need to link the data generated by Power BI in Log Analytics or Profiler traces to the data in the gateway logs. How can you do this?

Let’s take the refresh of an Import mode semantic model that uses an on-premises SQL Server database as a source as an example. If you have Log Analytics connected to your workspace you can run a KQL query something like this to find the events generated by your semantic model refresh by Power BI:

PowerBIDatasetsWorkspace
| where TimeGenerated > ago(1hr)
| where OperationName in ("CommmandBegin", "CommandEnd", "ProgressReportBegin", "ProgressReportEnd")
//| where OperationName in ("ExecutionMetrics")
| project OperationName, OperationDetailName, EventText, TimeGenerated, XmlaRequestId 
| order by TimeGenerated asc

All the events associated with the refresh command will have the same value in the XmlaRequestId column. If you’re using a Profiler trace to capture the same information the RequestID column will contain the same value.

If you then export the gateway logs and look in the Query Execution Report log file for example (I did it with Power Query – it’s just a CSV file) you can match the XmlaRequestId value from Log Analytics to the value in the RequestId column in the log file:

In this example my semantic model contains two tables sourced from an on-premises SQL Server database, so there are two entries in the Query Execution Report log file, one for each table. Once you’ve found the matching rows in the Query Execution Report log file then you can find all kinds of useful information such as how much time (if any) it took to read data from the source or to spool data to disk. You can do the same thing with the other gateway log files: for example the Open Connection Report will tell you how long it took to open any connections to the data source.

If you have a DirectQuery model connected to an on-premises data source then you do the same thing: all the events associated with a DAX query (such as the Query Begin/End events) will all have the same XmlaRequestId, and again all you need to do is find that value in the RequestId column in the gateway logs.

4 thoughts on “Finding Power BI Semantic Model Refresh Operations In Gateway Logs

  1. Gateway logs are flushed to disk very infrequently which makes them unusable for near real time analytics.

    I have yet to speak to another customer who actually was willing to pay for Log Analytics in a sustained fashion. It’s just way too expensive.

    1. Have it come down as a mandate from your security and fraud prevention people. It very quickly turns that decision around

  2. Chris, thanks for this blog.

    I have three cases open right now where the failures are not being made transparent to customers. There are fail/retry loops happening in the service itself. On my side, the only obvious clues are that the same mashups are running over and over. It is causing a lot of problems for us (users and back-end resource providers).

    The other clue is a vague message in the gateway’s “report” directory that says there was a “spooling failure” for one of the mashups. The mashups logs themselves are very clean.

    I spoke with a PTA and she said there are “n number of reasons” for failures in the Power BI service. Given that customers are paying the price for those failures, we think we should be allowed to have better visibility. Currently neither the customers nor mindtree are able to see the failures. They are totally hidden. The dataflow refresh history says our refreshes succeed, but it often takes a couple hours longer than expected, and it is a harrowing journey to get to the point where these things are succeeding.

    1. We had a similar set of errors start occurring a couple of weeks ago – large datasets that had previously been refreshing without issue form on-prem SQL servers, and had not been altered, began consuming effectively all the RAM on our gateway servers, even after we significantly increased the RAM. As far as we could tell, something was – as you say – causing the server to attempt to spool the same mashup containers over and over. The lack of transparency into these links in the chain between Fabric and on-prem servers is extremely frustrating, and while the connection to the log analytics in this post is somewhat helpful, getting at the log analytics is expensive and challenging. As with a lot of the Fabric admin experience, it just seems like getting at this information should be easier.

Leave a Reply