New Option To Solve Problems With Power Query Not Loading All Data From An Excel Worksheet

Some interesting new Power Query functionality was released in the December release of Power BI Desktop (it’s also in the build of Excel I’m running) that didn’t get announced in the blog post but which will be useful for anyone using Excel as a data source. It is fully documented but I thought I’d blog about it anyway to raise awareness.

It’s a new option on the Excel.Workbook function called InferSheetDimensions. Here’s what the docs for the Excel.Workbook function say:

Can be null or a logical (true/false) value indicating whether the area of a worksheet that contains data should be inferred by reading the worksheet itself, rather than by reading the dimensions metadata from the file. This can be useful in cases where the dimensions metadata is incorrect. Note that this option is only supported for Open XML Excel files, not for legacy Excel files. Default: false.

There’s also a much more detailed explanation on this page:

https://docs.microsoft.com/en-us/power-query/connectors/excel#troubleshooting

…in the section “Missing or incomplete Excel data” on how to troubleshoot issues where Power Query isn’t loading all the data from your worksheet.

Here’s a quick summary of the problem. Open XML format Excel workbooks (ie xlsx and xlsm format files) have some metadata (the “dimensions metadata” referred to in the docs) in them that acts as a kind of bounding box describing the range of cells in each worksheet that contain values. By default, Power Query uses this metadata when it gets data from an Excel worksheet – although not when it gets data from an Excel table or named range. Most of the time this metadata is reliable but in some cases, for example where an xlsx file has been generated by third-party software that has “export to Excel” functionality, it contains inaccurate information. This doesn’t affect Excel, which still loads the workbook, but it means that Power Query may not load all the data present in a worksheet. In these cases, setting InferSheetDimensions=true in Excel.Workbook causes Power Query to ignore the dimensions metadata and load all the data from the worksheet.

The docs also point out that if the dimension metadata returns a range that much larger than the actual populated range of cells in the worksheet, this can cause performance problems when loading data. While this can be fixed by manually editing the workbook as shown here, that’s not much help if you’re dealing with Excel files generated by buggy third-party software. Since Excel is already one of the slowest data sources you can use with Power Query, I guess this is another reason to use text formats like CSV with Power Query instead.

Handling Multiple URL Query Parameters With The Same Name Using Web.Contents In Power Query/Power BI

Some time ago I wrote a pair of popular posts about using the Query and RelativePath options of the Web.Contents function in Power Query and why they are important for dataset refresh. I have recently learned something extra about this subject which merits a new post, though: how to handle multiple URL query parameters with the same name.

In the examples for this post I’ll be using a free, fake web service available at http://jsonplaceholder.typicode.com/ which requires no authentication so you will be able to run the code I show yourself. Let’s start by considering the following call to the comments resource of this API:

http://jsonplaceholder.typicode.com/comments?postId=1

In M, you can use the Web.Contents function and the Query option to call the API like so:

//Generates the URL http://jsonplaceholder.typicode.com/comments?postId=1
Web.Contents("http://jsonplaceholder.typicode.com/comments",[Query=[postId="1"]])

Now this API – unlike many others – allows you to pass multiple URL query parameters of the same name. For example this is a valid call:

http://jsonplaceholder.typicode.com/comments?postId=1&postId=2

The problem is, if you try to do the following in M:

Web.Contents("http://jsonplaceholder.typicode.com/comments",[Query=[postId="1",postId="2"]])

…you’ll get an error, because the Query option take a record and a record can’t have two fields with the same name.

Here’s where it gets interesting. In the working example above, the postId field in the Query record contains a text value. The field can contain a list of text values instead, though, and this is how you solve the problem:

//Generates the URL http://jsonplaceholder.typicode.com/comments?postId=1&postId=2
Web.Contents("http://jsonplaceholder.typicode.com/comments",[Query=[postId={"1","2"}]])

This results in a URL with two query parameters that are both called postId but have the values 1 and 2 respectively.

There’s something else worth mentioning about this too. Sometimes when you’re generating a URL you may not want to add a query parameter to it if a value is null. One way of handling this is to start with an empty record and then add fields to it using the Record.AddField function, but using an empty list provides another approach. For example:

//Generates the URL http://jsonplaceholder.typicode.com/comments
//and *not* http://jsonplaceholder.typicode.com/comments?postId=
Web.Contents("http://jsonplaceholder.typicode.com/comments",[Query=[postId={}]])

…generates a URL without the postId query parameter in. This means you can write a simple function with an optional parameter for postId like this:

(optional myPostId as text) =>
Json.Document(
    Web.Contents("http://jsonplaceholder.typicode.com/comments",[Query=[postId=myPostId ?? {}]])
    )

If a text value is passed to myPostId then the postId query parameter is added to the URL; if no value is passed, myPostId is null and the ?? null coalescing operator (see Ben Gribaudo’s post here for more information on this operator) can be used to replace it with an empty list.

One last thing to mention: if you need to see the web service calls generated by Web.Contents when you’re testing in the Power Query Editor, you can use the query diagnostics functionality as I show here.

[I’m extremely grateful to Dave Maiden for letting me know about all this – it answers a question I have been asked several times in the past]

Inlining KQL Query Fragments Using Value.NativeQuery() In Power Query

If the title wasn’t enough to warn you, this post is only going to be of interest to M ultra-geeks and people using Power BI with Azure Data Explorer – and I know there aren’t many people in either group. However I thought the feature I’m going to show you in this post is so cool I couldn’t resist blogging about it.

Let’s say I have a table in Azure Data Explorer called Product that looks like this:

Because this table is in Azure Data Explorer I can query it with KQL, and the thing I love most about KQL as a language is the way that each query is built up as a series of piped transformations. So, for example, on the table above I could write the following query:

Product
| where Type=='Berry'
| where FruitID >1

…and get this result:

It seems quite similar to the way M works, right? Anyway, now for the cool part. When you connect Power Query in Power BI up to Azure Data Explorer, not only does query folding take place but you can mix and match native KQL fragments (added using the Value.NativeQuery M function) and Power Query transformations created in M, and it all still folds! This completely blew my mind when I found out about it. It’s documented briefly here but the docs don’t explain everything that you can do with it.

Consider the following M query:

let
  Source
    = AzureDataExplorer.Contents(
        "https://xyz.ukwest.kusto.windows.net/", 
        "MyDatabase", 
        "Product"
      ),
  #"Filtered Rows"
    = Table.SelectRows(
        Source, 
        each ([Type] = "Berry")
      ),
  Q1 = Value.NativeQuery(
      #"Filtered Rows", 
      "| where FruitID>2"
    )
in
  Q1

There are three steps:

  1. Source connects to the Product table
  2. #”Filtered Rows” is a filter generated by the Power Query UI to filter the table down to just the rows where Type is “Berry”
  3. Q1 uses Value.NativeQuery to add a KQL filter onto the output of #”Filtered Rows” so only the rows where FruitID is greater than 2

Here’s the KQL this folds to:

Product
| where strcmp(["Type"], ("Berry")) == 0
| where FruitID>2

You can also use Value.NativeQuery more than once. Here’s another M query that returns the same result as above:

let
  Source
    = AzureDataExplorer.Contents(
        "https://xyz.ukwest.kusto.windows.net/", 
        "MyDatabase", 
        "Product"
      ),
  Q3 = Value.NativeQuery(
      Source, 
      "| where Type=='Berry'"
    ),
  Q4 = Value.NativeQuery(
      Q3, 
      "| where FruitID>2"
    )
in
  Q4

Very useful when you want to build a query quickly using the Power Query Editor but then need to use a KQL feature that isn’t supported by Power Query.

Delaying Power BI Dataset Refresh Until The Source Data Is Ready

This week a customer came to me with the following problem: they had scheduled the refresh of their dataset but their source data wasn’t always ready in time, so the old data was being loaded by mistake. The best solution here is to use some kind of external service (for example Power Automate) to poll the data source regularly to see if it’s ready, and then to refresh the dataset via the Power BI REST API when it is. However, it got me thinking about a different way of tackling this: is it possible to write some M code that will do the same thing? It turns out that it is, but it’s quite complicated – so I don’t recommend you use the code below in the real world. Nevertheless I wanted to write up the solution I came up with because it’s interesting and you never know, it might be useful one day.

Here’s the scenario I implemented. Using a SQL Server data source, let’s say that you want to load the result of the following SQL query against the AdventureWorksDW2017 database into Power BI:

SELECT DISTINCT [EnglishDayNameOfWeek] FROM DimDate

However, let’s also say that the data in this table may not be ready at the time when dataset refresh is scheduled; when the data is ready, the sole value in the sole column of another table, called ContinueFlagTable, will be changed from False to True:

Therefore when dataset refresh starts we need to check the value from the ContinueFlagTable regularly; if it is False we need to wait for a given amount of time and then check again; if it is True the data can be loaded; if the flag hasn’t changed to True after checking a certain number of times then we need to raise an error.

Here’s the full M code of the Power Query query:

let
  //The SQL Server database
  Source = Sql.Database(
      "MyServerName", 
      "AdventureWorksDW2017"
    ),
  //The query to load into the dataset
  ResultQuery
    = Value.NativeQuery(
        Source, 
        "SELECT DISTINCT [EnglishDayNameOfWeek] FROM DimDate"
      ),
  //The number of times to check the Continue flag
  MaxIterations = 3,
  //The number of seconds to wait before checking the Continue flag
  DelaySeconds = 5,
  //Check the Continue flag
  Iterate = List.Generate(
      () => 1, 
      each (
      //Query the ContinueFlag table
      //after waiting the specified number of seconds
      //and keep trying until either it returns true
      //or we reach the max number of times to check(Function.InvokeAfter(
          () => 
            let
              dbo_ContinueFlag
                = Value.NativeQuery(
                    Source, 
                    "SELECT [ContinueFlagColumn] from ContinueFlagTable where -1<>@param", 
                    [
                      param
                        = _
                    ]
                  ),
              Continue
                = dbo_ContinueFlag{0}[ContinueFlagColumn]
            in
              Continue, 
          #duration(
              0, 
              0, 
              0, 
              DelaySeconds
            )
        )
        = false
      )
        and (_ < MaxIterations)
      ), 
      each _ + 1
    ),
  //Find how many times the ContinueFlag was checked
  NumberOfIterations
    = List.Max(Iterate),
  //Did we reach the max number of checks?
  ReturnError
    = NumberOfIterations
      = MaxIterations
      - 1,
  //Table type of the table to load into the dataset
  ReturnTableType
    = type table [
      EnglishDayNameOfWeek = Text.Type
    ],
  //Error message to return if the
  //max number of checks is reached
  ErrorMessage
    = error "After waiting "
      &amp; Text.From(
        MaxIterations
        * DelaySeconds
      )
      &amp; " seconds your data is still not ready to be loaded",
  //Handle Power BI Desktop's behaviour of asking for
  //the top 0 rows from the table
  //before loading the actual data
  OverrideZeroRowFilter = 
    if ReturnError then 
      ErrorMessage
    else 
      Table.View(
          null, 
          [
            GetType = () => 
              ReturnTableType, 
            GetRows = () => 
              ResultQuery, 
            OnTake
              = (
                count as number
              ) => 
                if count = 0 then 
                  #table(
                      ReturnTableType, 
                      {}
                    )
                else 
                  Table.FirstN(
                      ResultQuery, 
                      count
                    )
          ]
        )
in
  OverrideZeroRowFilter

There’s a lot to explain here:

  • Everything is in a single query, and this is deliberate: I found it was the only way I could make sure that the query to load the data (ie the query in the ResultQuery step) is run only once, after the check on the ContinueFlagTable has returned true. If I split the code to get the data off into a separate query, I found that it got queried before any checks on ContinueFlagTable; this seemed to be something to do with the formula firewall, but I’m not sure.
  • I’ve used List.Generate to do the polling of ContinueFlagTable. There’s a good example of how to use it to implement Do Loops here.
  • I’ve used Function.InvokeAfter to add the delays in between checks on ContinueFlagTable.
  • I found that if I ran the same query to check ContinueFlagTable, after the first execution the result was cached and the query couldn’t tell if the flag changed after that. Therefore I wrote a SQL query with a WHERE clause that was different each time but which always returned true; I did this by passing the iteration number into the query via a parameter, and I passed the parameter in using the Value.NativeQuery function.
  • It’s possible to raise an error when needed using the error keyword, described here.
  • The OverrideZeroRowFilter step at the end uses the technique I blogged about here to deal with the fact that Power BI Desktop asks for the top 0 rows from a table to get the schema before it loads the data.

Refreshing the table in Power BI Desktop with the value True in ContinueFlagTable loads the data successfully; in Profiler you can see one query to ContinueFlagTable and one query to get the data to be loaded from the DimDate table:

Refreshing the table with the value False in ContinueFlagTable shows the specified number of queries to ContinueFlagTable and no query to DimDate:

Here’s the error message you get in Power BI Desktop when this happens:

Here’s the same error in the Power BI Service when you try to refresh the dataset set:

Handling Multi-select In Power BI Dynamic M Parameters

Even though the documentation for dynamic M parameters does mention how to handle multi-select in the M code for your Power Query queries, I thought it would be useful to provide a detailed example of how to do this and explain what happens behind the scenes when you use multi-select.

Once again I’m going to use Azure Data Explorer as my DirectQuery data source in this post. Consider the following simple KQL query, which returns a list of counties in the UK and the average price paid for real estate in them:

pricepaid
| where county in ('DEVON', 'CORNWALL', 'KENT')
| summarize avg(price) by county

The third line of this query uses a combination of the KQL where and in operators to filter the list of counties in a way that’s very similar to other query languages such as SQL. The challenge is to write an M expression that will generate the comma-delimited list of county names in parantheses at the end of this line.

Taking the query above and generating a non-dynamic DirectQuery table in Power BI is straightforward, and let’s assume that you have already created an M parameter called Selected County:

… a table of county names:

…and bound the County column of this table to the SelectedCounty M parameter and turned on the Multi-select option in Model View in the main Power BI Desktop window:

Now the confusing thing for me, when I first started to look at this problem, was that the M parameter called SelectedCounty I created was of type Text but the example code in the documentation was written for an M parameter of type list – when in fact it isn’t possible to create an M parameter of type list in the Power Query Editor (at least not at the time of writing). It turns out that when the Multi-select option is turned on Power BI is able to send a value of type list to the parameter regardless of what type you have defined for it.

With that knowledge here’s an example of an M query to generate the query using the parameter:

let
  CountyList = 
    if 
    //check to see if the parameter is a list
      Type.Is(
        Value.Type(SelectedCounty), 
        List.Type
      ) then 
        //if it is a list
        let
          //add single quotes around each value in the list
          AddSingleQuotes = List.Transform(
              SelectedCounty, 
              each "'" &amp; _ &amp; "'"
            ),
          //then turn it into a comma-delimited list
          DelimitedList = Text.Combine(
              AddSingleQuotes, 
              ","
            )
        in
          DelimitedList
    else 
      //if the parameter isn't a list
      //just add single quotes around the parameter value
      "'" &amp; SelectedCounty &amp; "'",
  //generate and run the KQL query
  Source = AzureDataExplorer.Contents(
      "https://mycluster.northeurope.kusto.windows.net", 
      "pricepaid", 
      "pricepaid#(lf)
       | where county in 
       ("
        &amp; CountyList &amp;
       ")#(lf)
       | summarize avg(price) by county", 
      [
        MaxRows                 = null, 
        MaxSize                 = null, 
        NoTruncate              = null, 
        AdditionalSetStatements = null
      ]
    )
in
  Source

Note that the CountyList step has to check the data type of the parameter using an if statement, because in the Query Editor it will always be Text whereas in the report it will be a list if multi-select is turned on and Text if not.

Here’s the final report showing a multiselect slicer passing values into this query:

[Thanks to Ravi Kiran Vemulapalli and Sujata Narayana for their help with this post]

Why I’m Excited About Dynamic M Parameters In Power BI

My favourite feature in the October 2020 release of Power BI is undoubtedly dynamic M parameters. However, before I explain why, I’m going to put my hands up and admit that we screwed up the communication around its launch and didn’t explain its limitations properly – which led to some understandable disappointment and confusion.

Although the blog post and the docs have now been updated to make it clear that this feature doesn’t work with many popular data sources like SQL Server, Synapse, Oracle, Teradata and SAP (see here for the full details), I thought I’d explain why. It’s because there are two sets of connector that Power BI uses for DirectQuery: an older set that was built some time ago for the data sources originally supported by Analysis Services Tabular, and a newer set that is built in Power Query’s M language. Dynamic M parameters only work for this newer set of connectors; if you want to see this feature extended so it works for the older connectors too then please vote on ideas.powerbi.com, talk to your Microsoft account team, call Satya or whatever your normally do.

OK, so why am I excited? Well, because dynamic M parameters let you do a whole bunch of cool things that weren’t possible before. My original plan for this blog post was to show how they can be used to change the dimension column used in a visual dynamically but Mim beat me to it with this blog post. Instead in this post I’m going to show you how you can use them to create something like dynamic calculated columns and I also want to make a more general point that will Blow Your Mind (hopefully).

For my example I’m going to use Azure Data Explorer (ADX) as my data source. There are a couple of reasons why: it’s something I’ve been interested to learn more about for a while, and the ADX team recently announced a bunch of new features which make me think it will be increasingly useful in BI scenarios; it will be important to know it if you’re using the new Azure Monitor integration for Power BI log data; and, crucially, it’s one of the few M-based connectors where you can write your own queries in DirectQuery mode (although this is coming to Snowflake and Redshift next year too).

My source data is from the UK Land Registry Price Paid dataset, and for my demo report I loaded a table containing around 790,000 real estate transactions into ADX. Each real estate transaction has the price paid for the property:

One obvious way to analyse this data is to build a histogram and show the number of transactions that occur in price bins such as £0-£100000, £100000-£200000, £200000-£300000 and so on. This is quite easy to do in Power BI, for example using calculated columns in Import mode, unless you want to allow your users to be able to change the bin sizes themselves. Dynamic M parameters give you an easy way to solve this problem though.

KQL, the query language for ADX, has a really nice way of generating bins for histograms using the summarize() operator and bin() function. For example, the following KQL query does something like a SQL group by and shows the number of real estate transactions for each county and bins based on price in £25000 increments:

pricepaid 
| summarize Transactions=count() 
by county, bin(price, 25000)

In Power BI I created a parameter called binsizes:

…and then created a parameterised version of the KQL query above to populate a DirectQuery table. Here’s the M code for this table:

let
    Source = 
     AzureDataExplorer.Contents(
      "https://mycluster.northeurope.kusto.windows.net", 
      "pricepaid", 
      "pricepaid | summarize Transactions=count() by county, bin(price, " 
      &amp; Number.ToText(binsizes) &amp; 
      ")", 
      [MaxRows=null, MaxSize=null, NoTruncate=null, AdditionalSetStatements=null]
     )
in
    Source

Next I created a new table with the possible bin sizes that the user can choose using a DAX calculated table:

And then finally linked the column in this new table with the M parameter:

Here’s the final report, where you can see the number of transactions in each price bin and use slicers to change the price bins and change the county you’re looking at:

Apart from being a new solution to an old problem, what does this mean? Well, with a data source like ADX where M parameters are supported and you can write your own queries in DirectQuery mode, you can do whatever you want in the query so long as it always returns the same columns. As well as creating parameterised queries, in ADX you could also create a user-defined function (something like table-valued functions in SQL Server), pass parameters into it using M parameters, and have a totally dynamically generated fact table in your dataset. Either way, using dynamic M parameters in this way allows you to use KQL’s specialised functions for basket analysis or geospatial analysis to do things that are otherwise impossible in Power BI. I’m sure similar things are possible with other data sources too. And that, my friends, is why I’m so excited about dynamic M parameters.

A Function To Visualise Parallelism In Power Query Diagnostics Data

Most of the time I’ve spent looking at Power Query Query Diagnostics data, I’ve been looking at data for a single query. Over the past few days though I’ve spent some time investigating what this data shows for all the queries that are executed for a single dataset refresh. To help me do this I wrote the following M function:

(InputTable as table) => 
  let
    Source = InputTable,
    EarliestStart = List.Min(Source[Start Time]),
    AddRelativeStart = Table.AddColumn(
        Source, 
        "Relative Start", 
        each [Start Time] - EarliestStart
      ),
    AddRelativeEnd = Table.AddColumn(
        AddRelativeStart, 
        "Relative End", 
        each [End Time] - EarliestStart
      ),
    SetToDurations = Table.TransformColumnTypes(
        AddRelativeEnd, 
        {
          {"Relative Start", type duration}, 
          {"Relative End", type duration}
        }
      ),
    CalculatedTotalSeconds
      = Table.TransformColumns(
          SetToDurations, 
          {
            {
              "Relative Start", 
              Duration.TotalSeconds, 
              type number
            }, 
            {
              "Relative End", 
              Duration.TotalSeconds, 
              type number
            }
          }
        ),
    GroupedRows = Table.Group(
        CalculatedTotalSeconds, 
        {"Id", "Query"}, 
        {
          {
            "Relative Start", 
            each List.Min([Relative Start]), 
            type number
          }, 
          {
            "Relative End", 
            each List.Max([Relative End]), 
            type number
          }
        }
      ),
    MergeColumns = Table.AddColumn(
        GroupedRows, 
        "Id Query", 
        each Text.Combine({[Id], [Query]}, " "), 
        type text
      ),
    IdToNumber = Table.TransformColumnTypes(
        MergeColumns, 
        {{"Id", type number}}
      )
  in
    IdToNumber

If you invoke this function and pass in the “Detailed” query diagnostics data query:

…you’ll end up with a query that gives you a table that looks something like this:

This table has one row for each Power Query query that got executed while the diagnostics trace was running, an Id column to uniquely identify each execution, the name of the query executed, an Id Query column that concatenates the previous two columns, and Relative Start and Relative End columns that give you the number of seconds from the start time of the first query executed up to the start time and end time of the query on the current row.

Loading the data into Power BI allows you to build a stacked bar chart with Id Query on the axis and Relative Start and Relative End together in the values:

If you then set the Sort By Column property of Id Query to the Id column (which is numeric, so this ensures that the values in Id Query are sorted correctly) and set the Data Color property of the Relative Start values to white (or whatever the background of your chart is) so that it’s invisible, then finally set the axis of your stacked bar chart to be sorted by Id Query in ascending order, you get a visual something like this:

This is a kind of Gantt chart where the x axis shows the number of seconds since the start of the first query execution and each bar shows the start and end times of each query, making it easy to see which queries are executing in parallel.

“What can this show us?” I hear you say. Well, that’s something for a future blog post. All that I can say right now is that it’s COMPLICATED and I don’t properly understand it myself yet. But it’s undoubtedly interesting.

View Native Query Now Works For Analysis Services Data Sources

If you’re familiar with the topic of query folding in Power Query, you’ll know that the View Native Query right-click option in the Applied Steps pane of the Power Query Editor can be used to show the native query that is run against the data source. You may also know that there are some data sources where query folding does take place but where View Native Query remains greyed out. One of those used to be Analysis Services, but the good news is that that is no longer the case: you can use View Native Query when importing data from Analysis Services! Look:

I’m told it also now works for SAP BW, but I haven’t tested it.

Viewing Data Privacy Partition Information With Power Query Query Diagnostics

Back in May 2020 a new feature was added to Power BI’s Power Query Query Diagnostics functionality: the ability to view data privacy partition information. The announcement blog post has some basic information on this feature but in this post I’ll go into a bit more detail about how it actually works.

If you want a refresher on the subject of Power Query data privacy then this video is a good place to start. The Power Query documentation also has a detailed article on what data privacy partitions are.

Let’s see how Query Diagnostics displays partition information using a simple example. First of all, here’s a query called DayName that reads data from an Excel file that looks like this:

and returns the text of the day name shown:

let
  Source = Excel.Workbook(
      File.Contents("C:\SelectedDay.xlsx"), 
      null, 
      true
    ),
  DayName_Table
    = Source{[Item = "DayName", Kind = "Table"]}[Data],
  #"Changed Type" = Table.TransformColumnTypes(
      DayName_Table, 
      {{"Day Name", type text}}
    ),
  #"Day Name" = #"Changed Type"{0}[Day Name]
in
  #"Day Name"

Secondly, here’s a query that gets data from the DimDate table in the Adventure Works DW in SQL Server and filters the EnglishDayNameOfWeek column by the day name returned from the Excel workbook:

let
  Source = Sql.Databases("localhost"),
  AdventureWorksDW2017 = Source{[Name
    = "AdventureWorksDW2017"]}[Data],
  dbo_DimDate = AdventureWorksDW2017{[
    Schema = "dbo", 
    Item = "DimDate"
  ]}[Data],
  #"Filtered Rows" = Table.SelectRows(
      dbo_DimDate, 
      each ([EnglishDayNameOfWeek] = DayName)
    )
in
  #"Filtered Rows"

Both the Excel workbook and the SQL Server database data sources have their privacy level set to Public.

With the Query Diagnostics options set to record data privacy partition information:

…you can click on Diagnose Step for the last step of the second query above and you’ll get a diagnostics query that returns the following:

There are three rows in this query, one for each partition (only the leftmost columns are shown here). Each partition is identified by the values in the Partition Key column. The second and third rows represent the partitions that access the SQL Server database and the Excel workbook respectively. You can see there are list values in the Accessed Resources column for these rows: these are lists of the data sources accessed by these partitions. For example here’s what the list contains for the second row above:

The first row represents the partition where the data from the DimDate table is filtered by the day name from Excel. This row has a null value for Accessed Resources because it doesn’t access any data sources directly; it does have a value in the Partition Inputs column, another list, that returns the keys of the partitions that feed data into this partition. Here’s what that list looks like:

The values in the Firewall Group column tell you something useful about the data sources used in each partition. As the article on partitioning explains, when data passes from one partition to another, it may be buffered in memory – which may then have an impact on performance. Certain types of data source such as local files and folders are considered trusted, which means that they won’t be buffered even in the presence of Private data so long as all the data sources in the partition are trusted too.

Scrolling to the right-hand side of the query shows information on the amount of time taken to evaluate each partition, while the Diagnostics column contains a nested table that has the rows from the Detailed diagnostics query but just filtered by partition:

Indeed, if you look at the Detailed and Aggregated query diagnostics queries you’ll also see a Partition Key column, which not only means that you can see partition information when looking at all the diagnostics information for your query, it also means that in situations where the Partitions query diagnostics query doesn’t appear (which happens…) you can still see useful information about data privacy.

Power Query data privacy is one of the most difficult subjects in the whole of Power BI, but it’s also really important that you understand it: apart from the performance implications it can determine whether your query even runs or not. Having information like this in Query Diagnostics means that for the first time we have detailed information about how data privacy rules are applied in a particular query.

The M Behind The New Power Query Data Types In Excel

The big news this week – at least for me – was the release of the new Power Query data types to the Excel insiders channel. You can read all about it here:

https://insider.office.com/en-us/blog/power-query-data-types-in-excel

They’re the latest manifestation of Excel linked data types; cool things are also happening with them and Power BI featured tables too.

The announcement blog post explains pretty much everything you can do right now with Power Query data types but I was curious about the M code that is used to create them. Here’s an example query that takes this source table:

…and creates this Power Query data type:

let
  Source = #table(
      type table[
        Fruit = text, 
        Colour = text, 
        Sales = number
      ], 
      {
        {"Apples", "Green", 10}, 
        {"Lemons", "Yellow", 20}, 
        {"Strawberries", "Red", 30}
      }
    ),
  #"Created data type"
    = Table.CombineColumnsToRecord(
        Source, 
        "Data type", 
        {"Fruit", "Colour", "Sales"}, 
        [DisplayNameColumn = "Fruit", TypeName
          = "Excel.DataType"]
      )
in
  #"Created data type"

The magic happens with the #”Created data type” step and the Table.CombineColumnsToRecord function; so Power Query data types are basically columns that contain record values with (I guess, I need to check) some extra metadata.

%d bloggers like this: