Expanding Azure Data Explorer Dynamic Columns In Power Query

Azure Data Explorer has a data type called dynamic which can be used to hold scalar values as well as arrays and property bags; you can read about it in the docs here. For example (following on from my recent series on DirectQuery on Log Analytics, starting here) the ApplicationContext column in the PowerBIDatasetsWorkspace table that holds the IDs of the dataset, report and visual that generated a DAX query (see this post for more background) is of type dynamic:

This is what the contents of the column look like:

Now you can easily extract the individual property values from this column in KQL, and indeed I did so in the KQL queries in this post, but the interesting thing is you can also extract these values in Power Query M very easily and – crucially – maintain query folding using the Record.FieldOrDefault M function in a custom column without needing to write any KQL yourself, in both Import mode and DirectQuery mode.

In order to do this, first of all you have to enter a table name or KQL query in the third parameter of AzureDataExplorer.Contents function. When you do this you can treat a dynamic column as a record even if the Power Query UI doesn’t display it as such. Here’s an example M query that shows all of this in action on the PowerBIDatasetsWorkspace table that contains Power BI data in Log Analytics:

let
  Source = AzureDataExplorer.Contents(
    "InsertClusterName", 
    "InsertDBName", 
    "PowerBIDatasetsWorkspace", 
    [
      MaxRows                 = null, 
      MaxSize                 = null, 
      NoTruncate              = null, 
      AdditionalSetStatements = null
    ]
  ), 
  #"Added Custom" = Table.AddColumn(
    Source, 
    "Custom", 
    each Record.FieldOrDefault(
      [ApplicationContext], 
      "DatasetId"
    )
  )
in
  #"Added Custom"

From this query, here’s what the Custom Column dialog for the #”Added Custom” step looks like:

Here’s the output in the Power Query Editor:

And here’s the KQL query that this M query folds to (taken from the View Native Query dialog in the Power Query Editor):

PowerBIDatasetsWorkspace
| extend ["Custom"]=["ApplicationContext"]["DatasetId"]

I admit that this is a super-obscure tip but I think it’s fascinating nonetheless, especially given how nested structures are becoming more and more common in the world of big data. It would be great to have similar behaviour in other connectors…

Thanks to my colleague Itay Sagui (whose blog has several posts on Power BI/Azure Data Explorer integration) for this information.

Monitor Power BI Queries And Refreshes With DirectQuery On Log Analytics, Part 2: Dataset Refreshes

In the first post in this series I showed how it was possible to create a DirectQuery dataset connected to Log Analytics so you could analyse Power BI query and refresh activity in near real-time. In this post I’ll take a closer look into how you can use this data to monitor refreshes.

The focus of this series is using DirectQuery on Log Analytics to get up-to-date information (remember there’s already an Import-mode report you can use for long-term analysis of this data), and this influences the design of the dataset and report. If you’re an administrator these are the refresh-related questions you will need answers to:

  1. Which datasets are refreshing right now?
  2. Which recent datasets refreshes succeeded and which ones failed?
  3. If a dataset refresh failed, why did it fail?
  4. How long did these refreshes take?

To answer these questions I built a simple dataset with two tables in. First of all, I built a KQL query that answers these four questions and used it as the source for a DirectQuery table in my dataset. Here’s the query:

let
//get all refresh events
RefreshesInLast24Hours = 
//this table holds all the Power BI log data for a workspace
PowerBIDatasetsWorkspace
//assume that no currently running refresh has run for more than 24 hours
| where TimeGenerated > ago(24h)
//only return the events fired when refreshes begin and end
| where OperationName in ('CommandBegin', 'CommandEnd')  
and EventText has_any ('<Refresh xmlns', '\"refresh\"');
//gets just the events fired when the refreshes begin
let
CommandBegins = 
RefreshesInLast24Hours
| where OperationName == 'CommandBegin' ;
//gets just the events fired when the refreshes end
let
CommandEnds = 
RefreshesInLast24Hours
| where OperationName == 'CommandEnd' ;
//do a full outer join between the previous two tables
CommandBegins
| join kind=fullouter CommandEnds on XmlaRequestId, EventText
//select just the columns relevant to refreshes
| project ArtifactName, XmlaRequestId,
StartTime = TimeGenerated, EndTime = TimeGenerated1, 
DurationMs = DurationMs1, CpuTimeMs = CpuTimeMs1, 
Outcome = coalesce(Status1, 'In Progress'), 
ErrorCode = StatusCode1, 
WorkspaceId, WorkspaceName
| summarize StartTime=min(StartTime), EndTime=max(EndTime), DurationMs=max(DurationMs), 
CpuTimeMs=max(CpuTimeMs), ErrorCode=min(ErrorCode) 
by XmlaRequestId, ArtifactName, Outcome, WorkspaceId, WorkspaceName

Here’s an example of the output, a table with one row for each refresh in the last 24 hours:

Here’s how the query works:

  • Gets all the CommandBegin and CommandEnd events – the events that are fired when a refresh begins and ends – that occurred in the last 24 hours and stores them in two tables.
    • Only some of the CommandBegin and and CommandEnd events are related to refreshes, so the query filters on the EventText column to find them.
    • I chose to limit the data to the last 24 hours to keep the amount of data displayed to a minimum and because it’s unlikely that any refresh will take more than 24 hours. If I limited the data to the last hour, for example, then the query would not return the CommandBegin events or the CommandEnd events for any refreshes that started more than an hour ago and were still in progress, so these refreshes would not be visible in the report.
  • Does a full outer join between the table with the CommandBegin events and the table with the CommandEnd events, so the query returns one row for every that either started or ended in the last 24 hours.
  • Limits the columns in the results to show:
    • XmlaRequestId – the unique identifier that links all the events for a specific refresh operation
    • ArtifactName – the dataset name
    • WorkspaceId, WorkspaceName – the ID and name of the Power BI workspace where the dataset is stored
    • StartTime and EndTime – the refresh start time and end time,
    • DurationMs – the duration of the refresh in milliseconds
    • CpuTimeMs – the CPU Time consumed by the refresh in milliseconds,
    • Outcome – whether the refresh succeeded, failed or is still in progress
    • Errorcode – the error code that tells you why the refresh failed

Next I created a second table to return more detailed information about a specific refresh. Here’s an example of the KQL:

PowerBIDatasetsWorkspace
| where TimeGenerated > ago(24h)
| where OperationName in 
('CommandBegin', 'CommandEnd', 
'ProgressReportEnd','Error') 
| where XmlaRequestId in 
('5306b4ef-d74a-4962-9fc0-f7a299f16d61')
|project OperationName, OperationDetailName, TimeGenerated, 
EventText, XmlaRequestId, DurationMs, CpuTimeMs

This query returns the CommandBegin, CommandEnd, ProgressReportEnd and Error events for a specific refresh (so all these events have the same XmlaRequestId value).

One last complication was that, in my Power BI dataset, I was not able to create a relationship between these two table (I got some kind of query folding error although I don’t know why), so in order to be able to drill from the summary table to the detail table I had to use a dynamic M parameter for the XmlaRequestId used to filter the second table (see this post for more details on how to implement dynamic M parameters when working with KQL queries).

Here’s the report I built on this dataset showing how you can drill from the summary page to the details page for a particular refresh:

Not exactly pretty, but I’m not much of a data visualisation person and I think tables are fine for this type data. You can download a sample .pbix file (I couldn’t make a .pbit file work for some reason…) for this report here and prettify it yourself if you want. All you need to do is change the Cluster and Database parameters in the Power Query Editor to match the details of your Log Analytics workspace and database (see my last post for more details).

[Thanks to Chris Jones at Microsoft for his ideas and suggestions on this topic]

[Update 10th January 2022 – I’ve modified the summary KQL query based on feedback from another Microsoft colleague, Xiaodong Zhang, who pointed out that it didn’t work for TMSL scripts or datasets that use incremental refresh]

Monitor Power BI Queries And Refreshes With DirectQuery On Log Analytics, Part 1: Creating A Dataset

As a Power BI administrator you want to see what’s happening in your tenant right now: who’s running queries, which datasets are refreshing and so on. That way if a user calls you to complain that their report is slow or their dataset hasn’t refreshed yet you can start troubleshooting immediately. Power BI’s integration with Log Analytics (currently in preview with some limitations) is a great source of information for this kind of troubleshooting: it gives you the ability to send various useful Analysis Services engine events, events that give you detailed information about queries and refreshes among other things, to Log Analytics with a latency of only a few minutes. Once you’ve done that you can write KQL queries to understand what’s going on, but writing queries is time consuming – what you want, of course, is a Power BI report.

The dataset behind that Power BI report will need to be in DirectQuery mode if you want to see the most up-to-date data; unfortunately the current Log Analytics for Power BI Datasets template app is Import mode, which is only good if you want to look at long-term trends. It’s not obvious how to connect Power BI to Log Analytics in DirectQuery mode but it is possible: some time ago John White wrote a detailed blog post showing how you could use the Azure Data Explorer connector to allow Power BI to connect to Log Analytics in DirectQuery mode. If you read the comments on that post you’ll see that this method stopped working soon afterwards, although there was a workaround using an older version of the connector. The good news is that as of the December 2021 release of Power BI Desktop the bug was fixed and the details in John’s post are correct again. There is one other issue that needs mentioning: at the time of writing it only seems to be possible to create a DirectQuery dataset on Log Analytics if you specify a native KQL query, but since that can be simply the name of the table you want to query in Log Analytics that’s not much of a blocker.

Putting this all together, to be able to build a Power BI DirectQuery dataset and report to analyse query and refresh activity, you need to:

  1. Set up a Log Analytics workspace in the Azure Portal – see here for instructions.
  2. Connect the Power BI Service to this new Log Analytics workspace – see here for instructions – so that query and refresh events are logged to it.
  3. Open up Power BI Desktop and connect to the Log Analytics workspace using the Azure Data Explorer connector. In the connection dialog:
    • Enter a url pointing to your Log Analytics workspace in the following format in the Cluster box:
      https://ade.loganalytics.io/subscriptions/enter Azure subscription id here/resourcegroups/enter Azure resource group name here/providers/microsoft.operationalinsights/workspaces/enter Log Analytics workspace name here
    • Enter your Log Analytics workspace name in the Database box
    • In the “Table name or Azure Data Explorer query (optional)” box you need to enter a custom KQL query, which can simply be the following table name:
      PowerBIDatasetsWorkspace
    • Scroll down to the end of the dialog (which may be off the screen) and select DirectQuery for the Data Connectivity mode.

At this point you’ll have a dataset with a single table in DirectQuery mode that you can build your reports from; you might also want to use automatic page refresh to make sure the report page always shows the latest data. For example, here’s a very simple report showing query durations:

How do you interpret the data in the table in the dataset? Are there any interesting analyses you can do with more complex KQL queries or different ways of modelling the data? All good questions and ones that are out of scope for now. I’ll try to answer these questions in my next few posts though.

Inlining KQL Query Fragments Using Value.NativeQuery() In Power Query

If the title wasn’t enough to warn you, this post is only going to be of interest to M ultra-geeks and people using Power BI with Azure Data Explorer – and I know there aren’t many people in either group. However I thought the feature I’m going to show you in this post is so cool I couldn’t resist blogging about it.

Let’s say I have a table in Azure Data Explorer called Product that looks like this:

Because this table is in Azure Data Explorer I can query it with KQL, and the thing I love most about KQL as a language is the way that each query is built up as a series of piped transformations. So, for example, on the table above I could write the following query:

Product
| where Type=='Berry'
| where FruitID >1

…and get this result:

It seems quite similar to the way M works, right? Anyway, now for the cool part. When you connect Power Query in Power BI up to Azure Data Explorer, not only does query folding take place but you can mix and match native KQL fragments (added using the Value.NativeQuery M function) and Power Query transformations created in M, and it all still folds! This completely blew my mind when I found out about it. It’s documented briefly here but the docs don’t explain everything that you can do with it.

Consider the following M query:

let
  Source
    = AzureDataExplorer.Contents(
        "https://xyz.ukwest.kusto.windows.net/", 
        "MyDatabase", 
        "Product"
      ),
  #"Filtered Rows"
    = Table.SelectRows(
        Source, 
        each ([Type] = "Berry")
      ),
  Q1 = Value.NativeQuery(
      #"Filtered Rows", 
      "| where FruitID>2"
    )
in
  Q1

There are three steps:

  1. Source connects to the Product table
  2. #”Filtered Rows” is a filter generated by the Power Query UI to filter the table down to just the rows where Type is “Berry”
  3. Q1 uses Value.NativeQuery to add a KQL filter onto the output of #”Filtered Rows” so only the rows where FruitID is greater than 2

Here’s the KQL this folds to:

Product
| where strcmp(["Type"], ("Berry")) == 0
| where FruitID>2

You can also use Value.NativeQuery more than once. Here’s another M query that returns the same result as above:

let
  Source
    = AzureDataExplorer.Contents(
        "https://xyz.ukwest.kusto.windows.net/", 
        "MyDatabase", 
        "Product"
      ),
  Q3 = Value.NativeQuery(
      Source, 
      "| where Type=='Berry'"
    ),
  Q4 = Value.NativeQuery(
      Q3, 
      "| where FruitID>2"
    )
in
  Q4

Very useful when you want to build a query quickly using the Power Query Editor but then need to use a KQL feature that isn’t supported by Power Query.

Handling Multi-select In Power BI Dynamic M Parameters

Even though the documentation for dynamic M parameters does mention how to handle multi-select in the M code for your Power Query queries, I thought it would be useful to provide a detailed example of how to do this and explain what happens behind the scenes when you use multi-select.

Once again I’m going to use Azure Data Explorer as my DirectQuery data source in this post. Consider the following simple KQL query, which returns a list of counties in the UK and the average price paid for real estate in them:

pricepaid
| where county in ('DEVON', 'CORNWALL', 'KENT')
| summarize avg(price) by county

The third line of this query uses a combination of the KQL where and in operators to filter the list of counties in a way that’s very similar to other query languages such as SQL. The challenge is to write an M expression that will generate the comma-delimited list of county names in parantheses at the end of this line.

Taking the query above and generating a non-dynamic DirectQuery table in Power BI is straightforward, and let’s assume that you have already created an M parameter called Selected County:

… a table of county names:

…and bound the County column of this table to the SelectedCounty M parameter and turned on the Multi-select option in Model View in the main Power BI Desktop window:

Now the confusing thing for me, when I first started to look at this problem, was that the M parameter called SelectedCounty I created was of type Text but the example code in the documentation was written for an M parameter of type list – when in fact it isn’t possible to create an M parameter of type list in the Power Query Editor (at least not at the time of writing). It turns out that when the Multi-select option is turned on Power BI is able to send a value of type list to the parameter regardless of what type you have defined for it.

With that knowledge here’s an example of an M query to generate the query using the parameter:

let
  CountyList = 
    if 
    //check to see if the parameter is a list
      Type.Is(
        Value.Type(SelectedCounty), 
        List.Type
      ) then 
        //if it is a list
        let
          //add single quotes around each value in the list
          AddSingleQuotes = List.Transform(
              SelectedCounty, 
              each "'" & _ & "'"
            ),
          //then turn it into a comma-delimited list
          DelimitedList = Text.Combine(
              AddSingleQuotes, 
              ","
            )
        in
          DelimitedList
    else 
      //if the parameter isn't a list
      //just add single quotes around the parameter value
      "'" & SelectedCounty & "'",
  //generate and run the KQL query
  Source = AzureDataExplorer.Contents(
      "https://mycluster.northeurope.kusto.windows.net", 
      "pricepaid", 
      "pricepaid#(lf)
       | where county in 
       ("
        & CountyList &
       ")#(lf)
       | summarize avg(price) by county", 
      [
        MaxRows                 = null, 
        MaxSize                 = null, 
        NoTruncate              = null, 
        AdditionalSetStatements = null
      ]
    )
in
  Source

Note that the CountyList step has to check the data type of the parameter using an if statement, because in the Query Editor it will always be Text whereas in the report it will be a list if multi-select is turned on and Text if not.

Here’s the final report showing a multiselect slicer passing values into this query:

[Thanks to Ravi Kiran Vemulapalli and Sujata Narayana for their help with this post]

Geospatial Analysis With Azure Data Explorer, Power BI And Dynamic M Parameters

Since last week’s blog post about dynamic M parameters generated so much interest, this week I thought I’d give you another example of something cool you can do with them when you’re using Azure Data Explorer (ADX) as a DirectQuery source in Power BI: geospatial analysis.

Let’s say you work for a chain of supermarkets and want to use Power BI see what other competing stores are close to one of your stores. First of all you need some data, in in this case my data source was this list of British supermarkets and their locations , published by the nice people at Geolytix, and called Retail Points. Once again I loaded the csv file into ADX and was able to write a KQL query using the geo_point_in_circle() function to filter the list of supermarkets down to those within a specified distance of a given supermarket. For example, here’s a simple KQL query that finds all the supermarkets within a 5000m radius of the location of my local Marks and Spencer store in Amersham:

let centrestore =  
retailpoints
| where store_name == "M&S Amersham SF"
| project long_wgs, lat_wgs;
let centrelat = toscalar( centrestore | summarize max(lat_wgs));
let centrelong = toscalar ( centrestore | summarize max(long_wgs));
retailpoints
| where geo_point_in_circle(long_wgs, lat_wgs, centrelong, centrelat, 5000)

[Apologies if this isn’t the greatest KQL query – I’m still learning]

I was then able to create two M parameters called selectedstore and selectedradius and bind them to two other tables in my dataset using dynamic M parameters so that the end user could use slicers to select:

  • The store they want to use as the central point of their analysis
  • The size in metres of the radius of the circle around the central point within which other stores should be displayed

These M parameters could then be injected into the KQL query of a DirectQuery table using the following M query:

let
  Source = AzureDataExplorer.Contents(
      "https://mycluster.northeurope.kusto.windows.net", 
      "retailpoints", 
      Text.Format(
          "declare query_parameters(filterstore:string = ""#[storeMparametername]"", centreradius:int = #[radiusMparametername]);#(lf)let centrestore =  #(lf)retailpoints#(lf)| where store_name == filterstore#(lf)| project long_wgs, lat_wgs;#(lf)let centrelat = toscalar( centrestore | summarize max(lat_wgs));#(lf)let centrelong = toscalar ( centrestore | summarize max(long_wgs));#(lf)retailpoints#(lf)| where geo_point_in_circle(long_wgs, lat_wgs, centrelong, centrelat, centreradius)", 
          [
            storeMparametername  = selectedstore, 
            radiusMparametername = selectedradius
          ]
        ), 
      [
        MaxRows                 = null, 
        MaxSize                 = null, 
        NoTruncate              = null, 
        AdditionalSetStatements = null
      ]
    )
in
  Source

There are two important things to notice here:

  • I declared KQL query parameters to hold the selected store name and selected radius at the start of the KQL query. As the documentation notes, this is important to guard against KQL injection attacks.
  • I used the M Text.Format function to inject the M parameter values into the KQL query parameter declarations, which was easier and cleaner than generating the KQL query string by concatenating text.

And here’s the report I built, for example showing how you can select an Aldi store in Aberdeen and display all stores within a 1km, 3km and 7km radius of that store:

Not the most sophisticated geospatial report in the world, I admit, and I know very few people are using Azure Data Explorer as a data source for Power BI right now, but I do think it is a fun example of the kind analysis that dynamic M parameters now make possible.

Why I’m Excited About Dynamic M Parameters In Power BI

My favourite feature in the October 2020 release of Power BI is undoubtedly dynamic M parameters. However, before I explain why, I’m going to put my hands up and admit that we screwed up the communication around its launch and didn’t explain its limitations properly – which led to some understandable disappointment and confusion.

Although the blog post and the docs have now been updated to make it clear that this feature doesn’t work with many popular data sources like SQL Server, Synapse, Oracle, Teradata and SAP (see here for the full details), I thought I’d explain why. It’s because there are two sets of connector that Power BI uses for DirectQuery: an older set that was built some time ago for the data sources originally supported by Analysis Services Tabular, and a newer set that is built in Power Query’s M language. Dynamic M parameters only work for this newer set of connectors; if you want to see this feature extended so it works for the older connectors too then please vote on ideas.powerbi.com, talk to your Microsoft account team, call Satya or whatever your normally do.

OK, so why am I excited? Well, because dynamic M parameters let you do a whole bunch of cool things that weren’t possible before. My original plan for this blog post was to show how they can be used to change the dimension column used in a visual dynamically but Mim beat me to it with this blog post. Instead in this post I’m going to show you how you can use them to create something like dynamic calculated columns and I also want to make a more general point that will Blow Your Mind (hopefully).

For my example I’m going to use Azure Data Explorer (ADX) as my data source. There are a couple of reasons why: it’s something I’ve been interested to learn more about for a while, and the ADX team recently announced a bunch of new features which make me think it will be increasingly useful in BI scenarios; it will be important to know it if you’re using the new Azure Monitor integration for Power BI log data; and, crucially, it’s one of the few M-based connectors where you can write your own queries in DirectQuery mode (although this is coming to Snowflake and Redshift next year too).

My source data is from the UK Land Registry Price Paid dataset, and for my demo report I loaded a table containing around 790,000 real estate transactions into ADX. Each real estate transaction has the price paid for the property:

One obvious way to analyse this data is to build a histogram and show the number of transactions that occur in price bins such as £0-£100000, £100000-£200000, £200000-£300000 and so on. This is quite easy to do in Power BI, for example using calculated columns in Import mode, unless you want to allow your users to be able to change the bin sizes themselves. Dynamic M parameters give you an easy way to solve this problem though.

KQL, the query language for ADX, has a really nice way of generating bins for histograms using the summarize() operator and bin() function. For example, the following KQL query does something like a SQL group by and shows the number of real estate transactions for each county and bins based on price in £25000 increments:

pricepaid 
| summarize Transactions=count() 
by county, bin(price, 25000)

In Power BI I created a parameter called binsizes:

…and then created a parameterised version of the KQL query above to populate a DirectQuery table. Here’s the M code for this table:

let
    Source = 
     AzureDataExplorer.Contents(
      "https://mycluster.northeurope.kusto.windows.net", 
      "pricepaid", 
      "pricepaid | summarize Transactions=count() by county, bin(price, " 
      & Number.ToText(binsizes) & 
      ")", 
      [MaxRows=null, MaxSize=null, NoTruncate=null, AdditionalSetStatements=null]
     )
in
    Source

Next I created a new table with the possible bin sizes that the user can choose using a DAX calculated table:

And then finally linked the column in this new table with the M parameter:

Here’s the final report, where you can see the number of transactions in each price bin and use slicers to change the price bins and change the county you’re looking at:

Apart from being a new solution to an old problem, what does this mean? Well, with a data source like ADX where M parameters are supported and you can write your own queries in DirectQuery mode, you can do whatever you want in the query so long as it always returns the same columns. As well as creating parameterised queries, in ADX you could also create a user-defined function (something like table-valued functions in SQL Server), pass parameters into it using M parameters, and have a totally dynamically generated fact table in your dataset. Either way, using dynamic M parameters in this way allows you to use KQL’s specialised functions for basket analysis or geospatial analysis to do things that are otherwise impossible in Power BI. I’m sure similar things are possible with other data sources too. And that, my friends, is why I’m so excited about dynamic M parameters.

%d bloggers like this: