Displaying Azure Maps In A Power BI Paginated Report

The built-in mapping functionality in Power BI paginated reports is fairly basic. However the integration of Power Query into Power BI paginated reports gives you an interesting new way of creating maps in paginated reports: you can call the Azure Maps API using Power Query and display the image returned in an Image report item. In this blog post I’ll show you how.

Here’s a quick summary of what I’m going to do:

  • Call the API from https://data.police.uk/ (specifically the Crimes At Location endpoint) using Power Query to get all the recorded crimes within a one mile radius of a given latitude and longitude in a given month for any location in England, Wales or Northern Ireland
  • Take this list of crimes and pass them to the Azure Maps API Get Map Static Image endpoint to return an image of a map with the crime locations on it
  • Display this image in an Image report part in a paginated report

And here’s an example of what the final paginated report will look like:

Step 1: Sign up for the Azure Maps API

In order to call the Azure Maps API you’ll need to go to the Azure Portal and create a resource. The pricing is very reasonable: the first 1000 calls to the endpoint used here are free and after that it’s $4.50 per month for up to 500,000 calls, which should be more than enough for BI purposes.

Step 2: Create Shareable Cloud Connections

To connect to data sources in Power Query in paginated reports you need to create Shareable Cloud Connections in the Power BI portal. You’ll need two connections for this report: one for the Azure Maps API with the URL https://atlas.microsoft.com/map/static/png and one for the Crime API with the URL https://data.police.uk/api/crimes-street/all-crime. Both SCCs should have the authentication method Anonymous and the privacy level Public and have the Skip Test Connection option checked:

Step 3: Create a paginated report and Power Query query to call APIs

After creating a new paginated report in Power BI Report Builder you need to create a dataset (called AzureMap here) to get data from the APIs. This dataset uses Power Query as a source and has one main query (also called AzureMap) and four parameters:

  • lon and lat, to hold the latitude and longitude of the location to get crime data for, which will also be the centre point of the map
  • zoom, which is the zoom level of the map
  • yearmonth, which is the year and month in YYYY-MM format to get crime data for:

Here’s the M code for the query:

let
  CallCrimeAPI = Json.Document(
    Web.Contents(
      "https://data.police.uk/api/crimes-street/all-crime",
      [
        Query = [
          lat  = Text.From(lat),
          lng  = Text.From(lon),
          date = yearmonth
        ]
      ]
    )
  ),
  ToTable = Table.FromList(
    CallCrimeAPI,
    Splitter.SplitByNothing(),
    null,
    null,
    ExtraValues.Error
  ),
  First50 = Table.FirstN(ToTable, 50),
  ExpandColumn1 = Table.ExpandRecordColumn(
    First50,
    "Column1",
    {"location"},
    {"location"}
  ),
  Expandlocation = Table.ExpandRecordColumn(
    ExpandColumn1,
    "location",
    {"latitude", "street", "longitude"},
    {
      "location.latitude",
      "location.street",
      "location.longitude"
    }
  ),
  JustLatLon = Table.SelectColumns(
    Expandlocation,
    {"location.longitude", "location.latitude"}
  ),
  TypeToText = Table.TransformColumnTypes(
    JustLatLon,
    {
      {"location.longitude", type text},
      {"location.latitude", type text}
    }
  ),
  MergedColumns = Table.CombineColumns(
    TypeToText,
    {"location.longitude", "location.latitude"},
    Combiner.CombineTextByDelimiter(
      " ",
      QuoteStyle.None
    ),
    "LongLat"
  ),
  PrefixPipe = Table.TransformColumns(
    MergedColumns,
    {{"LongLat", each "|" & _, type text}}
  ),
  GetString = "|"
    & Text.Combine(PrefixPipe[LongLat]),
  QueryRecord = [
    #"subscription-key"
      = "InsertYourSubscriptionKeyHere",
    #"api-version" = "2022-08-01",
    layer = "basic",
    style = "main",
    #"zoom" = Text.From(zoom),
    center = Text.From(lon) & ", " & Text.From(lat),
    width = "768",
    height = "768"
  ],
  AddPins = try
    Record.AddField(
      QueryRecord,
      "pins",
      "default|sc0.5" & GetString
    )
  otherwise
    QueryRecord,
  CallAzureMapsAPI = Web.Contents(
    "https://atlas.microsoft.com/map/static/png",
    [Query = AddPins]
  ),
  ToText = Binary.ToText(
    CallAzureMapsAPI,
    BinaryEncoding.Base64
  ),
  OutputTable = #table(
    type table [image = text],
    {{ToText}}
  )
in
  OutputTable

You need to put all this code in a single M query to avoid the Formula.Firewall: Query ‘Query1’ (step ‘xyz’) references other queries or steps, so it may not directly access a data source. Please rebuild this data combination error. You can find out more about this error by watching my data privacy video here.

A few things to note:

  • The CallCrimeAPI step calls the Get Crimes At Location API endpoint to get all the reported crimes within a one mile radius of the given latitude and longitude in the given year and month.
  • Because of the way I’m sending the crime location data to the Azure Maps API I limited the number of locations to 50, in the First50 step, to avoid hitting errors relating to the maximum length of a URL.
  • The GetString step returns a pipe delimited list of longitudes and latitudes of crime locations for the Azure Maps API to display as pins on the map. However, some error handling is needed in case there were no reported crimes in the given location or month and that happens in the AddPins step.
  • The QueryRecord step contains all the parameters to send to the Azure Maps Get Map Static Image endpoint. This docs page has more information on what’s possible with this API – I’m barely scratching the surface of what’s possible in this example.
  • Authentication to the Azure Maps API is via a subscription key which you’ll need to pass to the subscription-key parameter. You can get the key from the resource created in step 1 in the Azure Portal.
  • The API returns an image binary which is converted to text and returned in a table with one column and one row in the ToText and OutputTable steps. The code is similar to what I showed in this blog post but luckily I didn’t seem to need to break it up into multiple rows.

Step 4: Create Power Query query to return values for Zoom parameter

The Zoom parameter of the Get Map Static Image API endpoint accepts a value between 0 and 20, which represents the zoom level of the displayed map. You need to create a separate dataset and M query to return a table containing those values with the following code:

let
  Source = {0 .. 20}, 
  #"Converted to table" = Table.FromList(
    Source, 
    Splitter.SplitByNothing(), 
    null, 
    null, 
    ExtraValues.Error
  ), 
  #"Changed column type"
    = Table.TransformColumnTypes(
    #"Converted to table", 
    {{"Column1", Int64.Type}}
  ), 
  #"Renamed columns" = Table.RenameColumns(
    #"Changed column type", 
    {{"Column1", "Zoom"}}
  )
in
  #"Renamed columns"

Step 5: Create paginated report parameters

Next you need to create four parameters in the paginated report for the longitude, latitude, zoom level and year month:

To make it easy for end users to select a zoom level, you need to bind the available values for the zoom parameter to the table returned by the dataset from the previous step:

Step 6: Display the map in an Image report part

In the paginated report itself the only interesting thing is the configuration of the Image report part in the centre of the report:

You need to set the image source to “Database”, bind it to the following expression

=First(Fields!image.Value, "AzureMap")

…which gets the text value from the sole row and column in the table returned by the AzureMap dataset created in step 3, and set the MIME type to be “image/png”.

And that’s it! After publishing you can enter any latitude and longitude in England, Wales or Northern Ireland, a year and month, and a zoom level, and get all the reported crimes on a map:

You can download the .rdl file with the paginated report in here (remember to edit the AzureMaps query to insert your Azure Map API key).

Power BI Paginated Reports That Connect To Web Services And Excel

By far the most exciting announcement for me this week was the new release of Power BI Report Builder that has Power Query built in, allowing you to connect to far more data sources in paginated reports than you ever could before. There’s a very detailed blog post and video showing you how this new functionality works here:

https://powerbi.microsoft.com/en-us/blog/get-data-with-power-query-available-in-power-bi-report-builder-preview

The main justification for building this feature was to allow customer to build paginated reports on sources like Snowflake or BigQuery, something which had only been possible before if you used an ODBC connection via a gateway or built a semantic model in between – neither of which are an ideal solution. However it also opens up a lot of other possibilities too.

For example, you can now build paginated reports on web services (with some limitations). I frequently get asked about building regular Power BI reports that get data from web services on demand – something which isn’t possible, as I explained here. To test using paginated reports on a web service I registered for Transport for London’s APIs and built a simple report on top of their Journey Planner API (Transport for London are the organisation that manages public transport in London). This report allows you to enter a journey starting point and ending point anywhere in or around London, calls the API and returns a table with different routes from the start to the destination, along with timings and instructions for each route. Here’s the report showing different routes for a journey from 10 Downing Street in London to Buckingham Palace:

You can also build paginated reports that connect to Excel workbooks that are stored in OneDrive or OneLake, meaning that changes made in the Excel workbook show up in the report as soon as the workbook is saved and closed:

So. Much. Fun. I’ll probably develop a presentation for user groups explaining how I built these reports soon.

And yes, if you need to export data to Excel on a schedule, paginated reports are now an even better choice. You know your users want this.

Overhead Of Getting Relationship Columns In Power BI DirectQuery Mode

Many Power BI connectors for relational databases, such as the SQL Server connector, have an advanced option to control whether relationship columns are returned or not. By default this option is on. Returning these relationship columns adds a small overhead to the time taken to open a connection to a data source and so, for Power BI DirectQuery semantic models, turning this option off can improve report performance slightly.

What are relationship columns? If you connect to the DimDate table in the Adventure Works DW 2017 sample database using Power Query, you’ll see then on the right-hand side of the table. The following M code:

let
Source = Sql.Database("localhost", "AdventureWorksDW2017"),
dbo_DimDate = Source{[Schema="dbo",Item="DimDate"]}[Data]
in
dbo_DimDate

…shows the relationship columns:

Whereas if you explicitly turn off the relationships by deselecting the “Including relationship columns” checkbox:

…you get the following M code with the CreateNavigationProperties property set to false:

let
Source = Sql.Database("localhost", "AdventureWorksDW2017", [CreateNavigationProperties=false]),
dbo_DimDate = Source{[Schema="dbo",Item="DimDate"]}[Data]
in
dbo_DimDate

…and you don’t see those extra columns.

How much overhead does fetching relationship columns add? It depends on the type of source you’re using, how many relationships are defined and how many tables there are in your model (because the calls to get this information are not made in parallel). It’s also, as far as I know, impossible to measure the overhead from any public telemetry such as a Profiler trace or to deduce it by looking at the calls made on the database side. The overhead only happens when Power BI opens a connection to a data source and the result is cached afterwards, so it will only be encountered occasionally and not for every query that is run against your data source. I can say that the overhead can be quite significant in some cases though and can be made worse by other factors such as a lack of available connections or network/gateway issues. Since I have never seen anyone actually use these relationship columns in a DirectQuery model – they are quite handy in Power Query in general though – you should always turn them off when using DirectQuery mode.

[Thanks to Curt Hagenlocher for the information in this post]

Reading Parquet Metadata In Power Query In Power BI

There’s a new M function in Power Query in Power BI that allows you to read the data from a Parquet file: Parquet.Metadata. It’s not documented yet and it’s currently marked as “intended for internal use only” but I’ve been told I can blog about it. Here’s an example of how to use it:

let
Source = Parquet.Metadata(File.Contents("C:\myfile.snappy.parquet"))
in
Source

…and here’s an example of the output:

This query shows how to expand the record returned by this function into a table:

let
m = Parquet.Metadata(File.Contents("C:\myfile.snappy.parquet")),
schema = List.Accumulate(Table.ToRecords(m[Schema]), [], (x, y) => if y[NumChildren] = null then Record.AddField(x, y[Name], y[LogicalType] ?? y[ConvertedType]) else x),
expanded1 = Table.ExpandTableColumn(m[RowGroups], "Columns", {"MetaData"}),
renamed1 = Table.RenameColumns(expanded1, {{"Ordinal", "RowGroup"}, {"TotalCompressedSize", "RowGroupCompressedSize"}, {"TotalByteSize", "RowGroupSize"}}),
expanded2 = Table.ExpandRecordColumn(renamed1, "MetaData", {"Type", "Encodings", "PathInSchema", "Codec", "NumValues", "TotalUncompressedSize", "TotalCompressedSize", "KeyValueMetadata", "DataPageOffset", "IndexPageOffset", "DictionaryPageOffset", "Statistics", "EncodingStats"}),
renamed2 = Table.RenameColumns(expanded2, {{"Type", "PhysicalType"}}),
added1 = Table.AddColumn(renamed2, "Column", each Text.Combine([PathInSchema])),
added2 = Table.AddColumn(added1, "Cardinality", each [Statistics][DistinctCount]),
added3 = Table.AddColumn(added2, "NullCount", each [Statistics][NullCount]),
added4 = Table.AddColumn(added3, "DictionarySize", each [DataPageOffset] - [DictionaryPageOffset]),
added5 = Table.AddColumn(added4, "LogicalType", each Record.FieldOrDefault(schema, [Column], null)),
selected = Table.SelectColumns(added5, {"RowGroup", "Column", "Codec", "NumValues", "Cardinality", "NullCount", "TotalCompressedSize", "TotalUncompressedSize", "DictionarySize", "PhysicalType", "LogicalType"})
in
selected

As you can see this gives you all kinds of useful information about a Parquet file such as the schema, the compression type used, column cardinality and so on.

[Thanks to Curt Hagenlocher for the tip-off and the query above]

Understanding The “Evaluation resulted in a stack overflow” Error In Power Query In Excel And Power BI

If you’re writing your own M code in Power Query in Power BI or Excel you may run into the following error:

Expression.Error: Evaluation resulted in a stack overflow and cannot continue.

If you’re a programmer you probably know what a stack overflow is; if you’re not you might search for the term, find this Wikipedia article and still have no clue what has happened. Either way it may still be difficult to understand why you’re running into it in Power Query. To explain let’s see some examples.

First, a really simple one. You can write recursive functions – functions that call themselves – in Power Query, although as you might suspect that is generally a bad idea because they are difficult to write, difficult to understand, slow and may result in the error above. Consider the following function called MyRecursiveFunction which references a parameter of type number called MaxDepth:

(counter as number) as number=>
if counter>MaxDepth then counter else @MyRecursiveFunction(counter+1)

The function takes a number and calls itself, passing in a value one greater than the number that was passed to it, until the number passed is greater than MaxDepth. So, if the value of MaxDepth is 3 and you call the function and pass it the value of 1, like so:

MyRecursiveFunction(1)

…then you’ll get the value 4 back:

So far so good. But how long can a function in M go on calling itself? As you can imagine, it’s not forever. So, when you hit the point where the function can’t go on calling itself then you get the error above. For example if you try setting MaxDepth to 100000 then you’ll get the stack overflow error above instead of 100001:

As a result it’s almost always a good idea to avoid recursion in Power Query and use functions like List.Transform, List.Generate or List.Accumulate to achieve the same result. A great example of this is shown in the Power Query custom connector documentation in the section on handling APIs that return results broken up into pages with the Table.GenerateByPage code sample.

You may still get this error even when you’re not explicitly using recursion though, as a result of lazy evaluation. Consider the following query which uses List.Accumulate to generate a table with a given number of rows:

let
//define a table with one row and one column called x
MyTable = #table(type table [x = number], {{1}}),
//specify how many rows we want in our output table
NumberOfTimes = 3,
//Use List.Accumulate to create a table with this number of rows
//By calling Table.Combine
CombineAllTables = List.Accumulate(
{1 .. NumberOfTimes},
null,
(state, current) => if current = 1 then MyTable else Table.Combine({state, MyTable})
)
in
CombineAllTables

Here’s the output, a table with three rows:

But how does it get this result? With NumberOfTimes=3 you can think of this query lazily building up an M expression something like this:

Table.Combine({Table.Combine({MyTable, MyTable}), MyTable})

…which, once List.Accumulate has finished, suddenly all has to be evaluated and turned into a single table. Imagine how much nesting of Table.Combine there would be if NumberOfTimes was a much larger number though! And indeed, it turns out that you can’t make lots and lots of calls to Table.Combine without running into a stack overflow. So if NumberOfTimes=100000 like so:

let

//define a table with one row and one column called x
MyTable = #table(type table [x = number], {{1}}),
//specify how many rows we want in our output table
NumberOfTimes = 100000,
//Use List.Accumulate to create a table with this number of rows
//by calling Table.Combine
CombineAllTables = List.Accumulate(
{1 .. NumberOfTimes},
null,
(state, current) => if current = 1 then MyTable else Table.Combine({state, MyTable})
)
in
CombineAllTables

…then, after a minute or so, you get the “Evaluation resulted in a stack overflow and cannot continue” error again.

Rewriting the query so you build up the list of tables first and only call Table.Combine once at the end avoids the problem and is much faster:

let
//define a table with one row and one column called x
MyTable = #table(type table [x = number], {{1}}),
//specify how many rows we want in our output table
NumberOfTimes = 100000,
//create a table with NumberOfTimes rows
CombineAllTables = Table.Combine(List.Repeat({MyTable}, NumberOfTimes))
in
CombineAllTables

It’s also possible to solve the problem by forcing eager evaluation inside List.Accumulate but this is extremely tricky: there’s an example of this on Gil Raviv’s blog here.

Getting Different Versions Of Data With Value.Versions In Power Query

Something I mentioned in my recent post about the new DeltaLake.Tables M function on the Fabric blog recently was the fact that you can get different versions of the data held in a Delta table using the Value.Versions M function. In fact, the Value.Versions is the way to access different versions of data in any source that has this concept – so long as Power Query has added support for doing so. The bad news is that, at least at the the time of writing, apart from the DeltaLake connector there’s only one other source where Value.Versions can be used in this way: the connector for Fabric Lakehouses.

Here’s how you can access the data in a table using the Lakehouse.Contents M function:

let
Source = Lakehouse.Contents(),
SelectWorkspace = Source{[workspaceId = "insertworkspaceid"]}[Data],
SelectLakehouse = SelectWorkspace{[lakehouseId = "insertlakehouseid"]}[Data],
SelectTable = SelectLakehouse{[Id = "nested_table", ItemKind = "Table"]}[Data]
in
SelectTable

As with DeltaLake.Table, you can get a table with all the different versions available using Value.Versions:

let
Source = Lakehouse.Contents(),
SelectWorkspace = Source{[workspaceId = "insertworkspaceid"]}[Data],
SelectLakehouse = SelectWorkspace{[lakehouseId = "insertlakehouseid"]}[Data],
SelectTable = SelectLakehouse{[Id = "nested_table", ItemKind = "Table"]}[Data],
ShowVersions = Value.Versions(SelectTable)
in
ShowVersions

Version 0 is the earliest version; the latest version of the data is the version with the highest number and this version can also be accessed from the row with the version number null. The nested values in the Data column are tables which give you the data for that particular version number. So, for example, if I wanted to get the data for version 2 I could click through on the nested value in the Data column in the row where the Version column contained the value 2. Here’s the M code for this:

let
Source = Lakehouse.Contents(),
SelectWorkspace = Source{[workspaceId = "insertworkspaceid"]}[Data],
SelectLakehouse = SelectWorkspace{[lakehouseId = "insertlakehouseid"]}[Data],
SelectTable = SelectLakehouse{[Id = "nested_table", ItemKind = "Table"]}[Data],
ShowVersions = Value.Versions(SelectTable),
Data = ShowVersions{2}[Data]
in
Data

The Lakehouse connector uses the TDS Endpoint of the Lakehouse to get data by default, as in the first code snippet above, but if you use Value.Versions to get specific versions then this isn’t (as yet) possible so it will use a slower method to get data and performance may suffer.

Last of all, you can get the version number of the data you’re looking at using the Value.VersionIdentity function. If you’re looking at the latest version of the data then Value.VersionIdentity will return null:

let
Source = Lakehouse.Contents(),
SelectWorkspace = Source{[workspaceId = "insertworkspaceid"]}[Data],
SelectLakehouse = SelectWorkspace{[lakehouseId = "insertlakehouseid"]}[Data],
SelectTable = SelectLakehouse{[Id = "nested_table", ItemKind = "Table"]}[Data],
GetVersion = Value.VersionIdentity(SelectTable)
in
GetVersion

If you are looking at version 2 of the data then Value.VersionIdentity will return 2:

let
Source = Lakehouse.Contents(),
SelectWorkspace = Source{[workspaceId = "insertworkspaceid"]}[Data],
SelectLakehouse = SelectWorkspace{[lakehouseId = "insertlakehouseid"]}[Data],
SelectTable = SelectLakehouse{[Id = "nested_table", ItemKind = "Table"]}[Data],
ShowVersions = Value.Versions(SelectTable),
GetVersion2 = ShowVersions{2}[Data],
GetVersionNumber = Value.VersionIdentity(GetVersion2)
in
GetVersionNumber

Anti Joins And Query Folding In Power Query In Excel And Power BI

Power Query allows you to merge (“join” in database terms) two tables together in a variety of different ways, including left and right anti joins. Unfortunately, as I found recently, anti joins don’t fold on SQL Server-related data sources, which can result in performance problems. Luckily there is a different way of doing anti joins that does fold.

Say you have two Power Query queries called Fruit1 and Fruit2 that get data from SQL Server tables containing the names of different varieties of fruit:

Now, let’s say you want to get a list of all the fruit varieties that are in Fruit1 and not in Fruit2. The obvious thing to do is to do a Merge and use the Left Anti option like so:

Here’s the M code, including an extra step to remove the join column that this creates:

let
    Source = Table.NestedJoin(Fruit1, {"Fruit"}, Fruit2, {"Fruit"}, "Fruit2", JoinKind.LeftAnti),
    #"Removed Other Columns" = Table.SelectColumns(Source,{"Fruit"})
in
    #"Removed Other Columns"

This gives you the correct result:

…but it doesn’t fold, so performance may be bad.

However, if you do a Merge and use the Left Outer option instead:

Then expand the join column (called Fruit2.Fruit here):

And then filter on that column so you only keep the rows where it contains the value null, and then remove that column, you get the same result:

Here’s the M:

let
    Source = Table.NestedJoin(Fruit1, {"Fruit"}, Fruit2, {"Fruit"}, "Fruit2", JoinKind.LeftOuter),
    #"Expanded Fruit2" = Table.ExpandTableColumn(Source, "Fruit2", {"Fruit"}, {"Fruit2.Fruit"}),
    #"Filtered Rows" = Table.SelectRows(#"Expanded Fruit2", each ([Fruit2.Fruit] = null)),
    #"Removed Other Columns" = Table.SelectColumns(#"Filtered Rows",{"Fruit"})
in
    #"Removed Other Columns"

This now does fold (meaning performance should be better) and gives you the following SQL:

select [_].[Fruit]
from 
(
    select [$Outer].[Fruit],
        [$Inner].[Fruit2]
    from [dbo].[Fruit1] as [$Outer]
    left outer join 
    (
        select [_].[Fruit] as [Fruit2]
        from [dbo].[Fruit2] as [_]
    ) as [$Inner] on ([$Outer].[Fruit] = [$Inner].[Fruit2] or [$Outer].[Fruit] is null and [$Inner].[Fruit2] is null)
) as [_]
where [_].[Fruit2] is null

Read Data From Delta Lake Tables With The DeltaLake.Table M Function

[This post was originally published on the official Power Query blog, which has now been taken down. I’m republishing all my posts there to this blog to ensure the content remains available.]

There’s a new function in Power Query’s M language for reading data from Delta Lake tables: the DeltaLake.Table function. This function is now available in Power Query in Power BI Desktop and in Dataflows Gen1 and Gen2 and replaces the need to use community-developed solutions like this one by Gerhard Brueckl. Let’s see some examples of how to use it.

Connecting to Delta Lake tables

DeltaLake.Table can connect to any folder that represents a Delta Lake table (which means, for example, that the folder you connect to should contain a subfolder called _delta_log) in either ADLSgen2 storage, a Fabric Lakehouse or a local file system. If you have a folder on your local PC with the path “C:\DeltaCoffee” that contains a DeltaLake table like so:

You can read the data in this Delta Lake table using the following M code:

let
    Source = Folder.Contents("C:\DeltaCoffee"),
    ToDelta = DeltaLake.Table(Source)
in
    ToDelta

The output in Power Query is a table with the contents of the Delta Lake table:

You can connect to Delta Lake tables in ADLSgen2 or a Fabric Lakehouse in a very similar way, using the AzureStorage.DataLake function to connect to the DFS endpoint of the folder containing the Delta Lake table. Here’s an example of how to connect to a folder in a Fabric Lakehouse using AzureStorage.DataLake (which works because OneLake supports access via existing ADLSgen2 APIs):

let
    Source = AzureStorage.DataLake("https://onelake.dfs.fabric.microsoft.com/insertworkspaceidhere/insertlakehouseidhere/Tables/DeltaCoffee/", [HierarchicalNavigation=true]),
    ToDelta = DeltaLake.Table(Source)
in
    ToDelta

Note that you need to set the HierarchicalNavigation option to true in the AzureStorage.DataLake function for this to work.

Nested structures

The DeltaLake.Table function can handle nested structures in Delta Lake tables in a similar way to how Power Query handles nested structures in other data sources. In the example shown above the coffee_profile column contains a nested structure. Here’s what it looks like when viewed inside a Fabric Lakehouse:

As shown in the first screenshot above the coffee_profile column in the table returned by DeltaLake.Table in Power Query contains Record values, which can be expanded by clicking on the icon in the top right-hand corner of the column:

Here’s what the table looks like in Power Query after the coffee_profile column has been expanded:

By default DeltaLake.Table will convert up to 32 levels of nested structures in this way. The number of levels of nesting that is converted can be controlled using the MaxDepth option, for example:

DeltaLake.Table(Source, [MaxDepth=10])

If the number of levels of nesting exceeds the maximum depth then an error will be raised.

Use of statistics

Power Query will use statistics to prune partitions and compute aggregates by default. If you need to stop it using statistics, perhaps because you know they are incorrect, you can do so by setting the UseStatistics option to false like so:

DeltaLake.Table(Source, [UseStatistics=false])

Time travel

Different versions of the Delta Lake table can be accessed using the Value.Versions function on the table returned by DeltaLake.Table, for example:

let
    Source = Folder.Contents("C:\DeltaCoffee"),
    ToDelta = DeltaLake.Table(Source),
    GetVersions = Value.Versions(ToDelta)
in
    GetVersions

Returns the following table, with one row for each version of the table and an extra row with a value of null in the Version column for the latest version:

The Data column contains values of type Value, which return the data in the table for each version. In the screenshot above the latest version is version 3; the row with the version number null, the latest version, returns the same data as the row with the version number 3.

If you access a specific version of the data you can get the version number by using the Value.VersionIdentity function. For example, the following M query gets the data for version number 2 of the Delta Lake table in the GetVersionNumberTwo step, which means that when Value.VersionIdentity is used in the final step it returns the value 2:

let
    Source = Folder.Contents("C:\DeltaCoffee"),
    ToDelta = DeltaLake.Table(Source),
    GetVersions = Value.Versions(ToDelta),
    GetVersionNumberTwo = GetVersions{2}[Data],
    GetVersionIdentity = Value.VersionIdentity(GetVersionNumberTwo)
in
    GetVersionIdentity

Understanding The “External table is not in the expected format” Error In Power Query In Power BI And Excel

Sometimes when you’re importing data from files using Power Query in either Power BI or Excel you may encounter the following error:

DataFormat.Error: External table is not in the expected format

What causes it? TL;DR it’s because you’re trying to load data from one type of file, probably Excel (I don’t think you can get this error with any other source but I’m not sure), and actually connecting to a different type of file.

Let’s see a simple example. Say you have a folder with two files: one is an Excel file called Date.xlsx and one is a CSV file called Date.csv.

Here’s the M code for a Power Query query that connects to the Excel file and reads the data from a table in it:

let
  Source     = Excel.Workbook(File.Contents("C:\MyFolder\Date.xlsx"), null, true), 
  Date_Table = Source{[Item = "Date", Kind = "Table"]}[Data]
in
  Date_Table

Now, if you change the file path in this query – and only the file path – to point at the CSV file instead like so:

let
  Source     = Excel.Workbook(File.Contents("C:\MyFolder\Date.csv"), null, true), 
  Date_Table = Source{[Item = "Date", Kind = "Table"]}[Data]
in
  Date_Table

…you will get the “external table is not in the expected format” error shown above. This is because your code is using the Excel.Workbook M function, which is used to import data from Excel workbooks, to connect to a file that is a CSV file and not an Excel workbook. The way to fix it is to use the appropriate function, in this case Csv.Document, to access the file like so:

let
  Source = Csv.Document(
    File.Contents("C:\MyFolder\Date.csv"), 
    [Delimiter = ",", Columns = 4, Encoding = 65001, QuoteStyle = QuoteStyle.None]
  ), 
  #"Promoted Headers" = Table.PromoteHeaders(Source, [PromoteAllScalars = true])
in
  #"Promoted Headers"

To be honest, if making this change is beyond your Power Query skills and you’re sure you’re trying to connect to the right file, you’re better off creating a completely new query rather than editing the query you already have.

Another common scenario where you might encounter this error is when you’re importing data from all the files in a folder and one of the files isn’t in the correct format. For example, let’s say you have a folder with three Excel files in and you use the Folder data source to import all the data from all three files:

Since all three files are Excel files the Folder option will work:

However, if you take a CSV file and drop it into the folder like so:

Then you’ll get the same error in Power Query:

Apart from deleting the CSV file you have another option to solve this problem in this case: filtering the folder so you only try to get data from the .xlsx files and no other file type. To do this, click on the step that is called “Source”. When you do this you’ll see that the step returns a table containing all the files in the folder you’re pointing at:

You’ll see that the table in this step contains a column called Extension which contains the file extension for each file. If you filter this table – which will insert a new step at this point in the query, which is ok – by clicking on the down arrow in the Extension column, delselecting the (Select All) option and selecting “.xlsx” so the table only contains .xlsx files then you can avoid this problem:

If, as in this example, the rogue file happens to be the first file in the folder and you’ve selected that first file to be your “sample” file when setting up the import, then you’ll also need to go to the query called Sample File in the Queries pane and make exactly the same change there (ie click on the Source step and filter to remove any non .xlsx files).

Identifying CPU And Memory-Intensive Power Query Queries During Refresh In The Power BI Service

Last year I wrote a post about a change in the Power BI Service that meant the CPU Time associated with Power Query transformations was added to the total shown for a dataset refresh operation in Profiler traces and Log Analytics:

https://blog.crossjoin.co.uk/2022/07/03/measuring-power-query-cpu-usage-during-power-bi-dataset-refresh/

This was useful, but it didn’t tell you directly how much CPU Time was used by Power Query and it didn’t tell you which tables or partitions in a refresh were using the most CPU. It also didn’t tell you anything about Power Query memory usage. The good news that recently there has been another change that solves these problems.

Let’s say you have a Power BI dataset that consists of a single table whose source is the following Power Query query:

let
  Source = #table(type table [MyNumber = number], List.Transform({1 .. 1000000}, each {_})), 
  #"Added Custom" = Table.AddColumn(
    Source, 
    "ARandomNumber", 
    each Number.RandomBetween(0, 10000), 
    type number
  ), 
  #"Sorted Rows" = Table.Sort(#"Added Custom", {{"ARandomNumber", Order.Ascending}})
in
  #"Sorted Rows"

This query creates a table with a million rows, adds a column with random numbers in and then sorts on that column – which is, as you’d expect, a very CPU and memory-hungry operation.

If you refresh this dataset in the Power BI Service and run a Profiler trace on it, looking at the Command Begin/End and Progress Report Begin/End events, this is what you’ll see:

The final Command End event shows the toal duration of the refresh as well as the amount of CPU used by both the Analysis Services engine and Power Query – in this case 24094ms.

If you look at the Progress Report End event associated with the finish of the refresh for the only partition of the only table in the dataset (highlighted in the screenshot above), there’s some extra information:

It shows the amount of CPU Time and the maximum amount of memory used by the Power Query engine while refreshing this partition. In this case the Power Query engine used 19468ms of CPU and reached a peak of 581848KB of memory. I can tell this is going to be really useful for troubleshooting refresh performance issues and out-of-memory errors.

[Thanks to Akshai Mirchandani, Xiaodong Zhang, Ting-Wei Chang and Jast Lu for this information]