When Can Partitioned Compute Help Improve Fabric Dataflow Performance?

Partitioned Compute is a new feature in Fabric Dataflows that allows you to run certain operations inside a Dataflow query in parallel and therefore improve performance. While UI support is limited at the moment it can be used in any Dataflow by adding a single line of fairly simple M code and checking a box in the Options dialog. But as with a lot of performance optimisation features (and this is particularly true of Dataflows) it can sometimes result in worse performance rather than better performance – you need to know how and when to use it. And so, in order to understand when this feature should and shouldn’t be used, I decided to do some tests and share the results here.

For my tests I created two queries within a single Dataflow Gen2 CICD. First, an M function called SlowFunction that takes a numeric value and returns that value with 1 added to it after a two second delay:

(input as number) as number =>
Function.InvokeAfter(()=>input+1, #duration(0,0,0,2))

Then the main query which returns a table of ten rows and calls the SlowFunction M function once per row:

let
Rows = List.Transform({1 .. 10}, each {_}),
MyTable = #table(type table [RowNumber = number], Rows),
#"Added custom" = Table.TransformColumnTypes(
Table.AddColumn(MyTable, "FunctionOutput", each SlowFunction([RowNumber])),
{{"FunctionOutput", Int64.Type}}
)
in
#"Added custom"

Here’s what the output of the query looks like:

Now, the first important question. How long does this Dataflow take to run? 10 rows calling a function that takes 2 seconds to run, so 10*2=20 seconds maybe? The answer is yes if you look at how long the preview takes to populate in the Dataflow Editor:

That’s just the preview though. If you’re refreshing a Dataflow there are other things that happen that affect how long that refresh takes, such as Staging and loading the data to a destination. There’s no way you can split the performance of your M code from these operations when looking at the duration of a Dataflow refresh in Recent Runs, which explains why some of the timings you will see later in this post seem strange. Don’t worry, though, it doesn’t stop you from seeing the important trends. I’m told that setting a CSV file in a Lakehouse as your data destination is the best way of minimising the impact of loading data on overall refresh durations but at the time of writing the CSV destination can’t be used with Partitioned Compute so all my tests used a Fabric Warehouse as a destination.

Here’s what Recent Runs showed when this Dataflow was refreshed:

The overall refresh time was 59 seconds; the query (called NonPartitioned here) that returns the table of ten rows and which was staged took 29 seconds.

Could this Dataflow benefit from Partitioned Compute? With Partitioned Compute enabled in the Options dialog, I added the necessary M code to the query:

let
Rows = List.Transform({1 .. 10}, each {_}),
MyTable = #table(type table [RowNumber = number], Rows),
#"Added custom" = Table.TransformColumnTypes(
Table.AddColumn(MyTable, "FunctionOutput", each SlowFunction([RowNumber])),
{{"FunctionOutput", Int64.Type}}
),
ReplacePartitionKey = Table.ReplacePartitionKey(#"Added custom", {"RowNumber"})
in
ReplacePartitionKey

…and then refreshed. Here’s what Recent Runs showed:

The overall refresh duration went up to 1 minute 25 seconds; the query that does all the work (called Partitioned in this case) took 40 seconds. Note the screenshot immediately above shows the engine used is “PartitionedCompute” and that there are now ten activities listed instead of one: my M code used the RowNumber column in the table as the partition key so the Dataflow attempted to run each row of the table as a separate operation in parallel. And as you can see, this made performance worse. This is because using Partitioned Compute introduces yet another overhead and that overhead is much greater than any benefit gained from parallelism in this case.

So I wondered: what if the delay in the query is increased from 2 second to 100 seconds then? Does this increase in the delay mean that parallelism results in faster overall performance?

Here’s what Recent Runs showed for a version of my Dataflow with a 100 second delay for each row and which didn’t use Partitioned Compute:

10 rows * 100 seconds = 1000 seconds = 16 minutes 40 seconds, so it’s not surprising that the overall duration of this version of the Dataflow was slow at 17 minutes 29 seconds.

Here’s what Recent Runs shows for the version of this Dataflow that did use Partitioned Compute:

The overall duration was 4 minutes 41 seconds and the main query took 3 minutes 14 seconds. The important takeaway is that this is a lot faster than the version that didn’t use Partitioned Compute, so clearly Partitioned Compute made a big difference to performance here. As you might expect, it looks like parallelising operations that only take a few seconds results in worse performance while parallelising operations that take longer, say a minute or more, is probably a good idea. As always, you’ll need to test to see what benefits you get for your Dataflows.

These results raise a lot of questions too. 100 seconds = 1 minute 40 seconds, which is a lot less than 3 minutes 14 seconds. Does this mean that not every row in the table was evaluated in parallel? Is partitioning on the RowNumber column counter-productive and would it be better to partition in some other way to try to reduce the amount of attempted parallelism? Is there something else that is limiting the amount of parallelism? While this version of the Dataflow always performs better than the non-partitioned version, performance did vary a lot between refreshes. While these tests show how useful Partitioned Compute can be for slow Dataflows, there’s a lot more research to do and a lot more blog posts to write.

A Closer Look At Preview-Only Steps In Fabric Dataflows

I have been spending a lot of time recently investigating the new performance-related features that have rolled out in Fabric Dataflows over the last few months, so expect a lot of blog posts on this subject in the near future. Probably my favourite of these features is Preview-Only steps: they make such a big difference to my quality of life as a Dataflows developer.

The basic idea (which you can read about in the very detailed docs here) is that you can add steps to a query inside a Dataflow that are only executed when you are editing the query and looking at data in the preview pane; when the Dataflow is refreshed these steps are ignored. This means you can do things like add filters, remove columns or summarise data while you’re editing the Dataflow in order to make the performance of the editor faster or debug data problems. It’s all very straightforward and works well.

The more I thought about this feature, though, the more I wondered about how it actually works and how it can be used in more complex queries that involve hand-written M code – so I decided to do a few tests. First of all, consider the following M query:

let
Step1 = 1,
Step2 = Step1 - 1
in
Step2

This query returns a numeric value: 0.

At this point it will return the same value in the editor when viewing the preview of the output and when the Dataflow refreshes. But if you right-click on Step2 and select “Enable only in previews” then the query still returns 0 in the preview but will return 1 when the Dataflow refreshes. You will also see this message displayed in the preview pane:

This data preview uses preview-only steps.
Results may differ when running the dataflow.

This makes sense because the query follows a linear pattern: the output references Step2 which in turn references Step1 so if you disable Step2 then the the output simply skips it and returns the value of Step1.

But what if your M code is more complex? For example consider this query:

let
x = 5,
y = 8,
xtimesy = x*y,
outputtable = #table(type table[xy=number],{{xtimesy}})
in
outputtable

Here’s what this returns in the editor: a table that contains the value 40.

What if you disable the step called “y”?

Here’s what I saw in my Warehouse when I loaded the output of the query to it:

I had no idea what the output would be before I saw it but, thinking about it, I assume what has happened is that since the step “y” has been disabled, “y” simply returns the value of the previous step in the query, “x”, and therefore the output becomes 5*5=25. This understanding of how preview-only steps work is backed-up by the fact that it is not possible to set step “x” to be preview-only. The option to do so is greyed out:

This all leads on to a pattern I used in a Dataflow this week and which I can see myself using a lot more in the future. While being able to disable steps when the Dataflow refreshes is very useful, sometimes what you need is to write some conditional logic in your M code that does one thing if you’re in the editor and another thing if the Dataflow is refreshing. Here’s a slightly modified version of the first query above which I have called IsRefresh in my Dataflow:

let
RefreshValue = 1,
PreviewValue = RefreshValue-1
in
PreviewValue

With the PreviewValue step set to be preview-only:

…then what we have here is a query that returns 1 when the Dataflow is refreshing and 0 when you’re viewing its output in the editor. Here’s an example of how it can be used in a query:

let
Source = #table(
type table [
NumericValue = number,
TextValue = text
],
{
{
IsRefresh,
if IsRefresh = 1 then
"This is a refresh"
else
"This is a preview"
}
}
)
in
Source

This query returns the following when you preview the output in the editor:

But when the Dataflow refreshes, the output in the destination is this:

Generating Sample Data With Copilot In Fabric Dataflows Gen2

As a Power Query fan I’m naturally interested in Copilot in Dataflows Gen2. But since Power Query is already very easy to use and since I’m pretty experienced at using it, can Copilot really do anything to help me be more productive? Well one thing I need to do a lot for blog posts, demos and testing is to generate tables of sample data. Typing data into the Enter Data source can be quite tedious and writing the M code to generate sample data isn’t straightforward, so I was wondering if Copilot in Dataflows Gen2 could do it for me. It turns out it can do so really well.

After a little bit of experimentation I’ve come up with an example prompt that can be easily modified to return a table of random values with columns of different data types and a given number of rows, and which works fairly reliably:

Return a table consisting of the following five columns: 
Country, containing one of 10 random country names; 
Product, containing one of 15 random fruit names; 
Person, containing one of 20 random first names that are common in the United Kingdom;
Date, containing random dates between January 1st 2019 and December 31st 2024;
Sales, containing random integers between 0 and 500. 
The table should have 1000 rows. Set an appropriate data type for each column.

Here’s an example of the M code generated:

let
    // Generate lists of random values
    Countries = {"United States", "Canada", "United Kingdom", "Germany", "France", "Italy", "Spain", "Australia", "India", "China"},
    Products = {"Apple", "Banana", "Cherry", "Date", "Elderberry", "Fig", "Grape", "Honeydew", "Kiwi", "Lemon", "Mango", "Nectarine", "Orange", "Papaya", "Quince"},
    People = {"James", "Mary", "John", "Patricia", "Robert", "Jennifer", "Michael", "Linda", "William", "Elizabeth", "David", "Barbara", "Richard", "Susan", "Joseph", "Jessica", "Thomas", "Sarah", "Charles", "Karen"},
    StartDate = #date(2019, 1, 1),
    EndDate = #date(2024, 12, 31),
    DaysBetween = Duration.Days(EndDate - StartDate) + 1,

    // Function to generate random dates
    RandomDate = () => Date.From(StartDate + #duration(Number.RoundDown(Number.RandomBetween(0, DaysBetween)), 0, 0, 0)),

    // Generate table records
    Records = List.Transform({1..1000}, each [
        Country = Countries{Number.Mod(_, List.Count(Countries))},
        Product = Products{Number.Mod(_, List.Count(Products))},
        Person = People{Number.Mod(_, List.Count(People))},
        Date = RandomDate(),
        Sales = Number.RoundDown(Number.RandomBetween(0, 500))
    ]),

    // Convert list of records to table and set data types
    Source = Table.FromRecords(Records),
    TypedTable = Table.TransformColumnTypes(Source, {
        {"Country", type text},
        {"Product", type text},
        {"Person", type text},
        {"Date", type date},
        {"Sales", Int32.Type}
    })
in
    TypedTable

And here’s an example of the output:

Definitely a time saver as far as I’m concerned. Is it totally reliable? No: it occasionally produces code that errors or which doesn’t contain genuinely random values, but it’s good enough and it’s faster to try the prompt once or twice than write the code myself. I know there are other, more sophisticated ways of generating sample data like this in Fabric, for example in Python, but as I said I’m a Power Query person.

And of course, for bonus points, we can now send the output of a Dataflow Gen2 to a CSV file in SharePoint which makes this even more useful:

Two New Power BI/Power Query Books

It’s time for another one of my occasional posts about free books that I’ve been given that you might be interested to check out. The usual disclaimers apply: these aren’t impartial, detailed reviews and the links contain an Amazon UK affiliate code so I get a kickback if you buy a copy.

Power Query Beyond The User Interface, by Chandeep Chhabra

Chandeep is of course famous on YouTube from the Goodly Power BI channel; I’ve met him at conferences and he’s just as friendly and passionate in real life. That passion shows through in this book. It’s aimed at people who already know Power Query and want to learn M and while it does a great job of that, I think anyone who writes M on a regular basis would also benefit from reading it. It’s packed with practical examples, well-written, everything is clearly explained and it covers more recent additions to the language that older books might not talk about. It’s focused on the M language and doesn’t cover topics like performance tuning but I think that focus is a good thing. Highly recommended for anyone serious about Power Query.

Architecting Power BI Solutions In Microsoft Fabric, by Nagaraj Venkatesan

It looks like the Packt marketing machine has gone into overdrive for this title because I see both Greg Lowe and Sandeep Pawar have already published their reviews, and I agree with their sentiments. Power BI (and even more so Fabric) is complicated and so there’s a huge demand for guidance around what all of the components do and how to put them together to create a solution. The team I work on at Microsoft, the Fabric CAT team, has a published guidance documentation here and other people have written books, blog posts and white papers addressing the same problem. This book is certainly a very useful addition to the existing literature. It covers newer topics like Power BI Copilot and some topics that are rarely if ever mentioned elsewhere, such as Power BI’s integration with Purview. As the other reviewers have mentioned, books like this always suffer from changes to the product making them out of date very quickly but that’s unavoidable. Also, being written by a Microsoft employee (and this is something I can relate to), it’s not very opinionated and doesn’t tell you which features of the product are good and which ones should be avoided. All in all, pretty good though.

Reading Delta Metadata In Power Query

There’s a new M function rolling out now that allows you to read metadata from Delta tables (at the time of writing it’s available in Dataflows Gen2 and will be available soon in Desktop). It builds on the DeltaLake.Table M function that allows you to read data from Delta tables and is similar to the Parquet.Metadata function that was released last year. Here’s an example of how to use it to get metadata from a Delta table in OneLake:

let
  Source = AzureStorage.DataLake(
    "https://onelake.dfs.fabric.microsoft.com/insertworkspaceidhere/insertlakehouseidhere/Tables/inserttablenamehere", 
    [HierarchicalNavigation = true]
  ), 
  ToDelta = DeltaLake.Metadata(
    DeltaLake.Table(Source)
  )
in
  ToDelta

The function returns a table of records containing the metadata from the Delta table such as the schema, how the table is partitioned, and whether the table is V-Ordered or not:

Improve Power Query Performance On CSV Files Containing Date Columns

A few weeks ago I replied to a question on reddit where someone was experiencing extremely slow performance when importing data from a CSV file using Power Query. The original poster worked out the cause of the problem and the solution themselves: they saw that removing all date columns from their query made their Power Query query much faster and that using the Date.FromText function and specifying the date format solved the problem. While I couldn’t reproduce the extreme slowness that was reported I was able to reproduce a performance difference between the two approaches and Curt Hagenlocher of the Power Query team confirmed that this was expected behaviour.

Let’s see an example. I created a CSV file with five date columns and one million rows, then created a Power Query query to import this data into Power BI Desktop using the default M code generated by the Power Query Editor:

let
  Source = Csv.Document(
    File.Contents("C:\GenerateDates.csv"),
    [
      Delimiter  = ",",
      Columns    = 5,
      Encoding   = 65001,
      QuoteStyle = QuoteStyle.None
    ]
  ),
  #"Promoted Headers" = Table.PromoteHeaders(
    Source,
    [PromoteAllScalars = true]
  ),
  #"Changed Type" = Table.TransformColumnTypes(
    #"Promoted Headers",
    {
      {"Extra Spaces", type date},
      {"Extra Spaces - 2", type date},
      {"Extra Spaces - 3", type date},
      {"Extra Spaces - 4", type date},
      {"Extra Spaces - 5", type date}
    }
  )
in
  #"Changed Type"

The dates in the CSV file were in the following format:

02  Jan   1901

…and this is important: there are two spaces between the day and the month name and three spaces between the month name and the year.

Using SQL Server Profiler I found that this query took around 14 seconds to run.

I then created a second query that, instead of using Table.TransformColumnTypes to set the data type on the columns, used Date.FromText and the Format option:

let
  Source = Csv.Document(
    File.Contents("C:\GenerateDates.csv"),
    [
      Delimiter  = ",",
      Columns    = 5,
      Encoding   = 65001,
      QuoteStyle = QuoteStyle.None
    ]
  ),
  #"Promoted Headers" = Table.PromoteHeaders(
    Source,
    [PromoteAllScalars = true]
  ),
  DateConversionFunction = (inputDate) as date =>
    Date.FromText(
      inputDate,
      [Format = "dd  MMM   yyyy"]
    ),
  ChangeDate = Table.TransformColumns(
    #"Promoted Headers",
    {
      {
        "Extra Spaces",
        DateConversionFunction,
        Date.Type
      },
      {
        "Extra Spaces - 2",
        DateConversionFunction,
        Date.Type
      },
      {
        "Extra Spaces - 3",
        DateConversionFunction,
        Date.Type
      },
      {
        "Extra Spaces - 4",
        DateConversionFunction,
        Date.Type
      },
      {
        "Extra Spaces - 5",
        DateConversionFunction,
        Date.Type
      }
    }
  )
in
  ChangeDate

This version of the query took around 10.5 seconds to run, so not a huge improvement but a noticeable one. It’s certainly not the 6/7x performance improvement seen on the reddit post but I’m sure different data, different date formats and different hardware might result in bigger differences.

I was told by Curt that when Power Query uses Table.TransformColumnTypes to parse date data from CSV files it tries a series of different date formats in order: first it tries ISO-8601 (for example 9th February 2025 would be “2025-02-09”), then a long date format, then a short date format, and finally it uses a generic .NET date parsing function which is slower than the others. It does this to make sure date parsing “just works” as often as possible. The dates in the example above, with the extra spaces, were deliberately designed to be slow for Table.TransformColumnTypes. When I tested on CSV files that contained dates in IS-8601 format I found that Table.TransformColumnTypes performed the same as Date.FromText.

So, to sum up, if you’re using CSV files containing date columns as a source for Power Query and you’re experiencing performance problems, try changing your M code to use Date.FromText instead of Table.TransformColumnTypes to set the data types on the date columns.

Module.Versions Function In Power Query

The ever-vigilant folks on the internet have spotted that there’s a new M function in the latest versions of Power BI and Excel: Module.Versions. This function, at the time of writing, returns a record with a single field in that contains the version number of the Power Query engine currently in use. So for example if I have a Power Query query in Power BI Desktop that consists of the following code:

Module.Versions()

It returns the following:

…where 2.129.181.0 is the version of the Power Query engine in my build of Power BI Desktop.

This function was introduced for people developing Power Query custom connectors who only want to enable certain functionality if the user is running a given version of the Power Query engine or above. I guess if you’re sharing your own M custom functions on the internet then you might want to do the same thing.

[Thanks to Curt Hagenlocher for giving me the inside information here]

New Semi Join, Anti Join And Query Folding Functionality In Power Query

There are a couple of nice new features to do with table joins (or merges as they are known in M) and query folding in Power Query in the April release of Power BI Desktop that I want to highlight.

Anti Joins now fold

First of all, a few months ago I wrote a post about how the built-in anti join functionality didn’t fold in Power Query. The good news is that it now does on SQL Server-related sources, so no more workarounds are needed. For example, if you have two tables in a SQL Server database called Fruit1 and Fruit2 and two Power Query queries that get data from those tables:

…then the following M code:

let
  Source = Table.Join(
    Fruit1,
    {"Fruit1"},
    Fruit2,
    {"Fruit2"},
    JoinKind.LeftAnti
  )
in
  Source

…returns the following table of fruits that are in the Fruit1 table and not in the Fruit2 table:

Of course that’s what the code above returned in previous versions of Power Query too. The difference now is that query folding occurs and the following SQL code is generated:

select [$Outer].[Fruit1],
    cast(null as nvarchar(50)) as [Fruit2]
from 
(
    select [_].[Fruit] as [Fruit1]
    from [dbo].[Fruit1] as [_]
) as [$Outer]
where not exists 
(
    select 1
    from 
    (
        select [_].[Fruit] as [Fruit2]
        from [dbo].[Fruit2] as [_]
    ) as [$Inner]
    where [$Outer].[Fruit1] = [$Inner].[Fruit2] or [$Outer].[Fruit1] is null and [$Inner].[Fruit2] is null

New join kind: semi joins

There are also two brand new join kind you can use in the Table.Join and Table.NestedJoin functions: JoinKind.LeftSemi and JoinKind.RightSemi. Semi joins allow you to select the rows in one table that have matching values in another table. Using the Fruit1 and Fruit2 tables above, the following M code:

let
  Source = Table.Join(
    Fruit1, 
    {"Fruit1"}, 
    Fruit2, 
    {"Fruit2"}, 
    JoinKind.LeftSemi
  )
in
  Source

Returns all the rows in Fruit1 where there is a matching value in Fruit2:

Here’s the SQL that is generated:

select [$Outer].[Fruit1],
    cast(null as nvarchar(50)) as [Fruit2]
from 
(
    select [_].[Fruit] as [Fruit1]
    from [dbo].[Fruit1] as [_]
) as [$Outer]
where exists 
(
    select 1
    from 
    (
        select [_].[Fruit] as [Fruit2]
        from [dbo].[Fruit2] as [_]
    ) as [$Inner]
    where [$Outer].[Fruit1] = [$Inner].[Fruit2] or [$Outer].[Fruit1] is null and [$Inner].[Fruit2] is null

The ?? operator now folds

The M language’s ?? coalesce operator is used for replacing null values and this now folds on SQL Server-related sources too now. For example, the M query in the previous section that did a semi join on Fruit1 and Fruit2 returns a table where all the rows in the Fruit2 colum contain null values. The following M query adds a new custom column that returns the text value “Nothing” when the Fruit2 column contains a null:

let
  Source = Table.Join(
    Fruit1, 
    {"Fruit1"}, 
    Fruit2, 
    {"Fruit2"}, 
    JoinKind.LeftSemi
  ), 
  ReplaceNulls = Table.AddColumn(
    Source, 
    "NullReplacement", 
    each [Fruit2] ?? "Nothing"
  )
in
  ReplaceNulls

Here’s the SQL generated for this, where the ?? operator is folded to a CASE statement:

select [_].[Fruit1] as [Fruit1],
    [_].[Fruit2] as [Fruit2],
    case
        when [_].[Fruit2] is null
        then 'Nothing'
        else [_].[Fruit2]
    end as [NullReplacement]
from 
(
    select [$Outer].[Fruit1],
        cast(null as nvarchar(50)) as [Fruit2]
    from 
    (
        select [_].[Fruit] as [Fruit1]
        from [dbo].[Fruit1] as [_]
    ) as [$Outer]
    where exists 
    (
        select 1
        from 
        (
            select [_].[Fruit] as [Fruit2]
            from [dbo].[Fruit2] as [_]
        ) as [$Inner]
        where [$Outer].[Fruit1] = [$Inner].[Fruit2] or [$Outer].[Fruit1] is null and [$Inner].[Fruit2] is null
    )
) as [_]

[Thanks to Curt Hagenlocher for the information in this post]

Displaying Azure Maps In A Power BI Paginated Report

The built-in mapping functionality in Power BI paginated reports is fairly basic. However the integration of Power Query into Power BI paginated reports gives you an interesting new way of creating maps in paginated reports: you can call the Azure Maps API using Power Query and display the image returned in an Image report item. In this blog post I’ll show you how.

Here’s a quick summary of what I’m going to do:

  • Call the API from https://data.police.uk/ (specifically the Crimes At Location endpoint) using Power Query to get all the recorded crimes within a one mile radius of a given latitude and longitude in a given month for any location in England, Wales or Northern Ireland
  • Take this list of crimes and pass them to the Azure Maps API Get Map Static Image endpoint to return an image of a map with the crime locations on it
  • Display this image in an Image report part in a paginated report

And here’s an example of what the final paginated report will look like:

Step 1: Sign up for the Azure Maps API

In order to call the Azure Maps API you’ll need to go to the Azure Portal and create a resource. The pricing is very reasonable: the first 1000 calls to the endpoint used here are free and after that it’s $4.50 per month for up to 500,000 calls, which should be more than enough for BI purposes.

Step 2: Create Shareable Cloud Connections

To connect to data sources in Power Query in paginated reports you need to create Shareable Cloud Connections in the Power BI portal. You’ll need two connections for this report: one for the Azure Maps API with the URL https://atlas.microsoft.com/map/static/png and one for the Crime API with the URL https://data.police.uk/api/crimes-street/all-crime. Both SCCs should have the authentication method Anonymous and the privacy level Public and have the Skip Test Connection option checked:

Step 3: Create a paginated report and Power Query query to call APIs

After creating a new paginated report in Power BI Report Builder you need to create a dataset (called AzureMap here) to get data from the APIs. This dataset uses Power Query as a source and has one main query (also called AzureMap) and four parameters:

  • lon and lat, to hold the latitude and longitude of the location to get crime data for, which will also be the centre point of the map
  • zoom, which is the zoom level of the map
  • yearmonth, which is the year and month in YYYY-MM format to get crime data for:

Here’s the M code for the query:

let
  CallCrimeAPI = Json.Document(
    Web.Contents(
      "https://data.police.uk/api/crimes-street/all-crime",
      [
        Query = [
          lat  = Text.From(lat),
          lng  = Text.From(lon),
          date = yearmonth
        ]
      ]
    )
  ),
  ToTable = Table.FromList(
    CallCrimeAPI,
    Splitter.SplitByNothing(),
    null,
    null,
    ExtraValues.Error
  ),
  First50 = Table.FirstN(ToTable, 50),
  ExpandColumn1 = Table.ExpandRecordColumn(
    First50,
    "Column1",
    {"location"},
    {"location"}
  ),
  Expandlocation = Table.ExpandRecordColumn(
    ExpandColumn1,
    "location",
    {"latitude", "street", "longitude"},
    {
      "location.latitude",
      "location.street",
      "location.longitude"
    }
  ),
  JustLatLon = Table.SelectColumns(
    Expandlocation,
    {"location.longitude", "location.latitude"}
  ),
  TypeToText = Table.TransformColumnTypes(
    JustLatLon,
    {
      {"location.longitude", type text},
      {"location.latitude", type text}
    }
  ),
  MergedColumns = Table.CombineColumns(
    TypeToText,
    {"location.longitude", "location.latitude"},
    Combiner.CombineTextByDelimiter(
      " ",
      QuoteStyle.None
    ),
    "LongLat"
  ),
  PrefixPipe = Table.TransformColumns(
    MergedColumns,
    {{"LongLat", each "|" & _, type text}}
  ),
  GetString = "|"
    & Text.Combine(PrefixPipe[LongLat]),
  QueryRecord = [
    #"subscription-key"
      = "InsertYourSubscriptionKeyHere",
    #"api-version" = "2022-08-01",
    layer = "basic",
    style = "main",
    #"zoom" = Text.From(zoom),
    center = Text.From(lon) & ", " & Text.From(lat),
    width = "768",
    height = "768"
  ],
  AddPins = try
    Record.AddField(
      QueryRecord,
      "pins",
      "default|sc0.5" & GetString
    )
  otherwise
    QueryRecord,
  CallAzureMapsAPI = Web.Contents(
    "https://atlas.microsoft.com/map/static/png",
    [Query = AddPins]
  ),
  ToText = Binary.ToText(
    CallAzureMapsAPI,
    BinaryEncoding.Base64
  ),
  OutputTable = #table(
    type table [image = text],
    {{ToText}}
  )
in
  OutputTable

You need to put all this code in a single M query to avoid the Formula.Firewall: Query ‘Query1’ (step ‘xyz’) references other queries or steps, so it may not directly access a data source. Please rebuild this data combination error. You can find out more about this error by watching my data privacy video here.

A few things to note:

  • The CallCrimeAPI step calls the Get Crimes At Location API endpoint to get all the reported crimes within a one mile radius of the given latitude and longitude in the given year and month.
  • Because of the way I’m sending the crime location data to the Azure Maps API I limited the number of locations to 50, in the First50 step, to avoid hitting errors relating to the maximum length of a URL.
  • The GetString step returns a pipe delimited list of longitudes and latitudes of crime locations for the Azure Maps API to display as pins on the map. However, some error handling is needed in case there were no reported crimes in the given location or month and that happens in the AddPins step.
  • The QueryRecord step contains all the parameters to send to the Azure Maps Get Map Static Image endpoint. This docs page has more information on what’s possible with this API – I’m barely scratching the surface of what’s possible in this example.
  • Authentication to the Azure Maps API is via a subscription key which you’ll need to pass to the subscription-key parameter. You can get the key from the resource created in step 1 in the Azure Portal.
  • The API returns an image binary which is converted to text and returned in a table with one column and one row in the ToText and OutputTable steps. The code is similar to what I showed in this blog post but luckily I didn’t seem to need to break it up into multiple rows.

Step 4: Create Power Query query to return values for Zoom parameter

The Zoom parameter of the Get Map Static Image API endpoint accepts a value between 0 and 20, which represents the zoom level of the displayed map. You need to create a separate dataset and M query to return a table containing those values with the following code:

let
  Source = {0 .. 20}, 
  #"Converted to table" = Table.FromList(
    Source, 
    Splitter.SplitByNothing(), 
    null, 
    null, 
    ExtraValues.Error
  ), 
  #"Changed column type"
    = Table.TransformColumnTypes(
    #"Converted to table", 
    {{"Column1", Int64.Type}}
  ), 
  #"Renamed columns" = Table.RenameColumns(
    #"Changed column type", 
    {{"Column1", "Zoom"}}
  )
in
  #"Renamed columns"

Step 5: Create paginated report parameters

Next you need to create four parameters in the paginated report for the longitude, latitude, zoom level and year month:

To make it easy for end users to select a zoom level, you need to bind the available values for the zoom parameter to the table returned by the dataset from the previous step:

Step 6: Display the map in an Image report part

In the paginated report itself the only interesting thing is the configuration of the Image report part in the centre of the report:

You need to set the image source to “Database”, bind it to the following expression

=First(Fields!image.Value, "AzureMap")

…which gets the text value from the sole row and column in the table returned by the AzureMap dataset created in step 3, and set the MIME type to be “image/png”.

And that’s it! After publishing you can enter any latitude and longitude in England, Wales or Northern Ireland, a year and month, and a zoom level, and get all the reported crimes on a map:

You can download the .rdl file with the paginated report in here (remember to edit the AzureMaps query to insert your Azure Map API key).

Power BI Paginated Reports That Connect To Web Services And Excel

By far the most exciting announcement for me this week was the new release of Power BI Report Builder that has Power Query built in, allowing you to connect to far more data sources in paginated reports than you ever could before. There’s a very detailed blog post and video showing you how this new functionality works here:

https://powerbi.microsoft.com/en-us/blog/get-data-with-power-query-available-in-power-bi-report-builder-preview

The main justification for building this feature was to allow customer to build paginated reports on sources like Snowflake or BigQuery, something which had only been possible before if you used an ODBC connection via a gateway or built a semantic model in between – neither of which are an ideal solution. However it also opens up a lot of other possibilities too.

For example, you can now build paginated reports on web services (with some limitations). I frequently get asked about building regular Power BI reports that get data from web services on demand – something which isn’t possible, as I explained here. To test using paginated reports on a web service I registered for Transport for London’s APIs and built a simple report on top of their Journey Planner API (Transport for London are the organisation that manages public transport in London). This report allows you to enter a journey starting point and ending point anywhere in or around London, calls the API and returns a table with different routes from the start to the destination, along with timings and instructions for each route. Here’s the report showing different routes for a journey from 10 Downing Street in London to Buckingham Palace:

You can also build paginated reports that connect to Excel workbooks that are stored in OneDrive or OneLake, meaning that changes made in the Excel workbook show up in the report as soon as the workbook is saved and closed:

So. Much. Fun. I’ll probably develop a presentation for user groups explaining how I built these reports soon.

And yes, if you need to export data to Excel on a schedule, paginated reports are now an even better choice. You know your users want this.