Generating Excel Reports With Fabric Dataflows Gen2

So many cool Fabric features get announced at Fabcon that it’s easy to miss some of them. The fact that you can now not only generate Excel files from Fabric Dataflows Gen2, but that you have so much control over the format that you can use this feature to build simple reports rather than plain old data dumps, is a great example: it was only mentioned halfway through this blog post on new stuff in Dataflows Gen2 Nonethless it was the Fabcon feature announcement that got me most excited. This is because it shows how Fabric Dataflows Gen2 have gone beyond being just a way to bring data into Fabric and are now a proper self-service ETL tool where you can extract data from a lot of different sources, transform it using Power Query, and load it to a variety of destinations both inside Fabric and outside it (such as CSV files, Snowflake and yes, Excel).

The documentation for the new Excel destination, which you can find here, is extremely detailed indeed so I thought it would be useful to show a simple example of how you can now use Dataflows Gen2 to build an Excel report. First of all I created a query using the Enter Data source that returned a table with some sales data in:

let
Source = Table.FromRows(
Json.Document(
Binary.Decompress(
Binary.FromText(
"i45WciwoyEktVtJRMlWK1YlW8i9KzEsH8w0NwAIBqYlFYK4RmOtelFgAkbZUio0FAA==",
BinaryEncoding.Base64
),
Compression.Deflate
)
),
let
_t = ((type nullable text) meta [Serialized.Text = true])
in
type table [Product = _t, Sales = _t]
),
#"Changed column type" = Table.TransformColumnTypes(
Source,
{{"Product", type text}, {"Sales", Int64.Type}}
)
in
#"Changed column type"

I then created a query called ReportTitle that contained the text for my report’s title:

let
Source = #table({"Title"},{{"My Sales Report"}})
in
Source

…and a query called FruitSalesOverview that passes the data from the FruitSales query to the FabricAI.Prompt M function to generate a text summary of it:

let
Source = FabricAI.Prompt("Summarise the fruit sales data in 20 words or less", FruitSales),
ToTable = #table({"Summary"}, {{Source}})
in
ToTable

The last query I created, called Output, generated a navigation table in the format described in the docs to describe the Excel output: the report title in a range starting in cell B1, the report summary in a range starting in cell B3, the sales data in a table starting in cell B5 and a bar chart showing the sales data.

let
excelDocument = #table(
type table [
Sheet = nullable text,
Name = nullable text,
PartType = nullable text,
Properties = nullable record,
Data = any
],
{
// Report title
{"Sales", "Title", "Range", [StartCell = "B1", SkipHeader = true], ReportTitle},
// Copilot-generated summary
{"Sales", "Summary", "Range", [StartCell = "B3", SkipHeader = true], FruitSalesOverview},
// Table containing sales data
{
"Sales",
"SalesTable",
"Table",
[StartCell = "B5", TableStyle = "TableStyleMedium9"],
FruitSales
},
//Column chart containing sales data
{
"Sales",
"SalesChart",
"Chart",
[
ChartType = "Column",
ChartTitle = "Fruit Sales",
DataSeries = [AxisColumns = {"Product"}, ValueColumns = {"Sales"}]
],
#table({}, {}) meta [Name = "SalesTable"]
}
}
)
in
excelDocument

I then set the Data Destination of the Output query to use the New File option to create an Excel file and save it to the Files section of a Fabric Lakehouse. The use of the Advanced format option meant that the navigation table returned by the Output query was used to determine the structure of the resulting Excel file.

After refreshing the Dataflow I downloaded the resulting Excel workbook from my Lakehouse. Here’s what it looked like:

Pretty fun. Does it give you full control over the format of the Excel file? No, not quite. Is it a somewhat code-heavy approach? Yes, but I suppose in the age of AI that doesn’t matter so much since you’re unlikely to write your own code (although, being old-school, I adapted the code above from the docs manually). Most importantly: is this a better way of dumping data to Excel and/or generating simple Excel reports in Fabric than paginated reports? Good question, especially since Power Query is now available in paginated reports. I suspect the answer is that although paginated reports are harder to build (though you can generate rdl with AI too, I’ve done it) and that it’s harder to control what a paginated report rendered as an Excel file looks like, paginated reports may still have the edge if you want an Excel report but I’m not sure – factors like CU cost and how long it takes to generate an Excel file using each approach would also need to be taken into account. If you just want to dump data to Excel, however, Dataflows Gen2 are probably a better option now.

New Performance Optimisation for Excel PivotTables Connected To Power BI Semantic Models

Some good news: an important optimisation has rolled out for Excel PivotTables connected to Power BI semantic models! Back in 2019 I wrote about a very common problem affecting the MDX generated by PivotTables connected to Analysis Services where, when subtotals are turned off and grand totals are turned on, the query nevertheless returns the subtotal values. This led to extremely slow, expensive MDX queries being run and a lot of complaints. The nice people on the Excel team have now fixed this problem and PivotTables connected to Power BI semantic models generate MDX queries that only return the values needed by the PivotTable.

Here’s an example of a PivotTable connected to a published Power BI semantic model:

Note that the subtotals have been turned off but the grand totals are displayed – this is important. Here’s the MDX query generated for this PivotTable:

SELECT NON EMPTY 
{ /* GTOPT-BEGIN CSECTIONS=2 */ 
 /* GTOPT-SECT-BEGIN-1 Desc:GrandTotal */ 
{([Property Transactions].[New].[All],[Property Type].[Property Type Name].[All])}
 /* GTOPT-SECT-END-1 */ 
,
 /* GTOPT-SECT-BEGIN-2 Desc:Detailed */ 
{Hierarchize(CrossJoin({[Property Transactions].[New].[New].AllMembers}, 
{([Property Type].[Property Type Name].[Property Type Name].AllMembers)}))}
 /* GTOPT-SECT-END-2 */ 
} /* GTOPT-END */ 
 DIMENSION PROPERTIES PARENT_UNIQUE_NAME,HIERARCHY_UNIQUE_NAME ON COLUMNS  
 FROM [Model] 
 WHERE ([Measures].[Count Of Sales]) 
 CELL PROPERTIES VALUE, FORMAT_STRING, LANGUAGE, BACK_COLOR, FORE_COLOR, FONT_FLAGS

And here’s what this query returns:

There are 11 values displayed in the PivotTable and the MDX query returns 11 values. It’s what you’d expect but as I said, up to now, Excel would have generated an MDX query that returned 13 values – a query that also requested the subtotal values that aren’t displayed.

This optimisation should now be rolled out to 100% of Excel users. You can tell if you are using the new query pattern by looking for comments in the MDX code with the text “GTOPT” in – they’re easy to spot in the query shown above. Right now the optimisation only happens for PivotTables connected to Power BI semantic models but I’ve been told that in future it should also happen for PivotTables connected to Azure Analysis Services and SSAS; this is because some server-side optimisations are necessary to make the new MDX perform as well as possible.

You might be thinking that, despite my excitement, this is a very niche scenario but I assure you it’s not: Excel users very frequently create PivotTables that are formatted to look like tables, and having subtotals turned off and grand totals turned on is a key part of this. The more fields that are put on rows the more subtotals there are to calculate and the more the overhead increases; it’s not uncommon to find situations where the number of subtotal values is much greater than the number of values actually displayed in the PivotTable.

This doesn’t solve all the performance problems associated with PivotTables and Power BI though and more work is planned for the future.

[Thanks to Yaakov Ben Noon for driving this work!]

Using Excel Copilot To Import Data With Power Query

Although it was announced in this blog post on the Microsoft 365 Insider blog recently, you might have missed the news that Excel Copilot can now generate Power Query queries. There are limitations for now: it can only be used to connect to other Excel files stored in OneDrive or SharePoint and it can’t do any transformations in the queries it creates, but it’s still exciting news nonetheless. Well the kind of news I get excited by at least.

Since the announcement blog post didn’t give many details of how it works let’s see an example of it in action. Let’s say you have an Excel workbook called SprocketsWidgetsSales.xlsx that contains a table of data showing sales of sprockets and widgets – the products your company sells – by country:

Now let’s say you create a new, blank workbook and open the Copilot pane. Entering the prompt:

Search for data on sales of sprockets and widgets

…gives you the data from the first workbook in the response:

At the bottom you can see a citation reference pointing to the workbook containing the source data and clicking that reference opens that workbook in Excel Online, but we don’t want to do that, we want to load the data into the current workbook using Power Query. Clicking on “Show tables to import” shows a preview of all the Excel tables (in this case there’s only one) in the workbook:

Expanding “Show import query” shows the M code for the Power Query query it can generate:

And clicking “Import to new sheet” creates that Power Query query and runs it:

You can see the Power Query query it creates in the Queries & Connections pane and edit it in the Power Query Editor like any other query:

Here’s the output of the query in a table on a new worksheet:

Of course now you have the table of data on your worksheet you can do other things like:

chart this data by country and product

…or ask questions like:

which country had the lowest sales of sprockets?

…and other things that you’d expect Copilot to be able to do. But the key thing is that Copilot is can now generate Power Query queries! I’m looking forward to see how this feature improves in the future.

Improve Power Query Performance On CSV Files Containing Date Columns

A few weeks ago I replied to a question on reddit where someone was experiencing extremely slow performance when importing data from a CSV file using Power Query. The original poster worked out the cause of the problem and the solution themselves: they saw that removing all date columns from their query made their Power Query query much faster and that using the Date.FromText function and specifying the date format solved the problem. While I couldn’t reproduce the extreme slowness that was reported I was able to reproduce a performance difference between the two approaches and Curt Hagenlocher of the Power Query team confirmed that this was expected behaviour.

Let’s see an example. I created a CSV file with five date columns and one million rows, then created a Power Query query to import this data into Power BI Desktop using the default M code generated by the Power Query Editor:

let
  Source = Csv.Document(
    File.Contents("C:\GenerateDates.csv"),
    [
      Delimiter  = ",",
      Columns    = 5,
      Encoding   = 65001,
      QuoteStyle = QuoteStyle.None
    ]
  ),
  #"Promoted Headers" = Table.PromoteHeaders(
    Source,
    [PromoteAllScalars = true]
  ),
  #"Changed Type" = Table.TransformColumnTypes(
    #"Promoted Headers",
    {
      {"Extra Spaces", type date},
      {"Extra Spaces - 2", type date},
      {"Extra Spaces - 3", type date},
      {"Extra Spaces - 4", type date},
      {"Extra Spaces - 5", type date}
    }
  )
in
  #"Changed Type"

The dates in the CSV file were in the following format:

02  Jan   1901

…and this is important: there are two spaces between the day and the month name and three spaces between the month name and the year.

Using SQL Server Profiler I found that this query took around 14 seconds to run.

I then created a second query that, instead of using Table.TransformColumnTypes to set the data type on the columns, used Date.FromText and the Format option:

let
  Source = Csv.Document(
    File.Contents("C:\GenerateDates.csv"),
    [
      Delimiter  = ",",
      Columns    = 5,
      Encoding   = 65001,
      QuoteStyle = QuoteStyle.None
    ]
  ),
  #"Promoted Headers" = Table.PromoteHeaders(
    Source,
    [PromoteAllScalars = true]
  ),
  DateConversionFunction = (inputDate) as date =>
    Date.FromText(
      inputDate,
      [Format = "dd  MMM   yyyy"]
    ),
  ChangeDate = Table.TransformColumns(
    #"Promoted Headers",
    {
      {
        "Extra Spaces",
        DateConversionFunction,
        Date.Type
      },
      {
        "Extra Spaces - 2",
        DateConversionFunction,
        Date.Type
      },
      {
        "Extra Spaces - 3",
        DateConversionFunction,
        Date.Type
      },
      {
        "Extra Spaces - 4",
        DateConversionFunction,
        Date.Type
      },
      {
        "Extra Spaces - 5",
        DateConversionFunction,
        Date.Type
      }
    }
  )
in
  ChangeDate

This version of the query took around 10.5 seconds to run, so not a huge improvement but a noticeable one. It’s certainly not the 6/7x performance improvement seen on the reddit post but I’m sure different data, different date formats and different hardware might result in bigger differences.

I was told by Curt that when Power Query uses Table.TransformColumnTypes to parse date data from CSV files it tries a series of different date formats in order: first it tries ISO-8601 (for example 9th February 2025 would be “2025-02-09”), then a long date format, then a short date format, and finally it uses a generic .NET date parsing function which is slower than the others. It does this to make sure date parsing “just works” as often as possible. The dates in the example above, with the extra spaces, were deliberately designed to be slow for Table.TransformColumnTypes. When I tested on CSV files that contained dates in IS-8601 format I found that Table.TransformColumnTypes performed the same as Date.FromText.

So, to sum up, if you’re using CSV files containing date columns as a source for Power Query and you’re experiencing performance problems, try changing your M code to use Date.FromText instead of Table.TransformColumnTypes to set the data types on the date columns.

Refreshing A Power BI Semantic Model With Eventstreams, Pipelines And Activator

Following on from my last post where I showed how to send data from Power Automate to a Fabric Eventstream, in this post I’m going to show how to use it to solve one of my favourite problems: refreshing a Power BI semantic model that uses an Excel workbook stored in OneDrive as a source when that Excel workbook is modified.

Now before I go on I want to be clear that I know this is a ridiculously over-engineered and expensive (in terms of CUs) solution, and that you can do almost the same thing just using Power Automate or in several other different ways – see my colleague Mark Pryce-Maher’s recent videos on using Fabric Open Mirroring with Excel for example. I’m doing this to teach myself Fabric Eventstreams and Activator and see what’s possible with them. Please excuse any mistakes or bad practices.

To start off, I built a Power BI semantic model containing a single table, connected to an Excel workbook stored in OneDrive for Business:

I then built a Fabric Pipeline to refresh the semantic model, using the newly-enhanced Semantic Model Refresh activity:

Next, I created a new Fabric Eventstream and created a slightly more sophisticated version of the Power Automate flow in my previous post that ran every 30 seconds, checked to see if the Excel file was modified, and if it was, sent an event to the Eventstream:

The Compose action in this flow generates some JSON that is sent to the Eventstream and contains the name of the file that has been altered and a dummy value called EventCounter that always contains 1:

I then connected the Eventstream to a new Activator:

In the Activator, I created a new object called ExcelWorkbook. I used the FileName field as the unique identifier and added the EventCounter field as a property:

Finally I created a new rule on the EventCounter property that aggregated the events over 1 minute windows and then fired when the total number of events in each window changed to the value 0 and stayed at 0 for 1 minute:

This is where the real value of Activator’s rules comes in. As I said, it’s easy using other approaches to refresh a semantic model when an Excel file changes. The problem with doing this, however, is that Excel files in OneDrive for Business autosave on a regular basis and if you make several changes to the file over the space of several minutes, you’ll end up trying to refresh the semantic model all the time. This can lead to excessive CU usage on your capacity because you’re refreshing too often as well as situations where one refresh fails because another is already in progress. What the rule above does is only trigger a refresh when changes to the Excel workbook have been detected (so the Total is greater than 0), then there has been a minute where no changes have been detected (so the Total changes to 0) and there has been a further minute where no changes have been detected (so the Total stays at 0). Since Activator rules can trigger Fabric items, I hooked my rule up to the Pipeline shown above to run the refresh:

This screenshot of the Definition tab of the rule shows a pattern of events where there were four one-minute windows where the Excel file was edited and the rule fired three times:

The problem with this rule is that if the Excel file is edited constantly for a long period, so that there is no period of inactivity, the semantic model would not get refreshed until that period of activity is over, which might be too long to wait. I guess the solution would be another rule that detects periods of, say, 10 minutes when the file has been edited constantly and refreshes the model even if there is no period of inactivity.

All in all I think this shows how the combination of Power BI/Power Platform and new features in Fabric can be used to solve old problems in new, interesting ways. I’m looking forward to learning more about Fabric Real-Time Intelligence and Activator!

[Thanks to James Hutton from the Activator team for his help with this post]

The “DataFormat.Error: File contains corrupted data” Error In Power Query

When you’re using Excel workbooks as a data source in Power Query in either Excel or Power BI you may encounter the following error when trying to connect:

DataFormat.Error: File contains corrupted data.

There are three possible causes of this error that I know of, two of which are well documented. First, the file may actually be corrupt in which case I hope you have a backup! Second, the Excel file may be password protected. Power Query cannot connect to a password protected Excel file so you’ll need to remove the password protection before you can use it as a source.

The third is one I ran into this week and it confused me a lot. It turns out that if the Excel file has a sensitivity label applied to it that results in it being encrypted, Power Query in Excel will not be able to connect to it – although Power Query in Power BI will be able to. When connecting to an encrypted Excel file, Power Query in Power BI is able to decrypt the file using the credentials of the logged-in user and read it, but (at least at the time of writing) Power Query in Excel cannot do this and so you get the error above. In my case I had an Excel workbook that used Power Query to connect to a few other Excel workbooks, load data from them, and do some tax calculations. It was working a few months ago but when I reopened it this week I got the “DataFormat.Error” error and I couldn’t work out why. It turns out that in the meantime an admin at Microsoft had applied a company-wide policy that meant all workbooks stored in OneDrive for Business had a highly restrictive sensitivity label applied automatically – which means my Power Query queries stopped working. As soon as I changed the sensitivity label on my source workbooks to “Non Business” so they weren’t encrypted, everything worked again.

[Thanks to Curt Hagenlocher for this information]

Module.Versions Function In Power Query

The ever-vigilant folks on the internet have spotted that there’s a new M function in the latest versions of Power BI and Excel: Module.Versions. This function, at the time of writing, returns a record with a single field in that contains the version number of the Power Query engine currently in use. So for example if I have a Power Query query in Power BI Desktop that consists of the following code:

Module.Versions()

It returns the following:

…where 2.129.181.0 is the version of the Power Query engine in my build of Power BI Desktop.

This function was introduced for people developing Power Query custom connectors who only want to enable certain functionality if the user is running a given version of the Power Query engine or above. I guess if you’re sharing your own M custom functions on the internet then you might want to do the same thing.

[Thanks to Curt Hagenlocher for giving me the inside information here]

Understanding The “Evaluation resulted in a stack overflow” Error In Power Query In Excel And Power BI

If you’re writing your own M code in Power Query in Power BI or Excel you may run into the following error:

Expression.Error: Evaluation resulted in a stack overflow and cannot continue.

If you’re a programmer you probably know what a stack overflow is; if you’re not you might search for the term, find this Wikipedia article and still have no clue what has happened. Either way it may still be difficult to understand why you’re running into it in Power Query. To explain let’s see some examples.

First, a really simple one. You can write recursive functions – functions that call themselves – in Power Query, although as you might suspect that is generally a bad idea because they are difficult to write, difficult to understand, slow and may result in the error above. Consider the following function called MyRecursiveFunction which references a parameter of type number called MaxDepth:

(counter as number) as number=>
if counter>MaxDepth then counter else @MyRecursiveFunction(counter+1)

The function takes a number and calls itself, passing in a value one greater than the number that was passed to it, until the number passed is greater than MaxDepth. So, if the value of MaxDepth is 3 and you call the function and pass it the value of 1, like so:

MyRecursiveFunction(1)

…then you’ll get the value 4 back:

So far so good. But how long can a function in M go on calling itself? As you can imagine, it’s not forever. So, when you hit the point where the function can’t go on calling itself then you get the error above. For example if you try setting MaxDepth to 100000 then you’ll get the stack overflow error above instead of 100001:

As a result it’s almost always a good idea to avoid recursion in Power Query and use functions like List.Transform, List.Generate or List.Accumulate to achieve the same result. A great example of this is shown in the Power Query custom connector documentation in the section on handling APIs that return results broken up into pages with the Table.GenerateByPage code sample.

You may still get this error even when you’re not explicitly using recursion though, as a result of lazy evaluation. Consider the following query which uses List.Accumulate to generate a table with a given number of rows:

let
//define a table with one row and one column called x
MyTable = #table(type table [x = number], {{1}}),
//specify how many rows we want in our output table
NumberOfTimes = 3,
//Use List.Accumulate to create a table with this number of rows
//By calling Table.Combine
CombineAllTables = List.Accumulate(
{1 .. NumberOfTimes},
null,
(state, current) => if current = 1 then MyTable else Table.Combine({state, MyTable})
)
in
CombineAllTables

Here’s the output, a table with three rows:

But how does it get this result? With NumberOfTimes=3 you can think of this query lazily building up an M expression something like this:

Table.Combine({Table.Combine({MyTable, MyTable}), MyTable})

…which, once List.Accumulate has finished, suddenly all has to be evaluated and turned into a single table. Imagine how much nesting of Table.Combine there would be if NumberOfTimes was a much larger number though! And indeed, it turns out that you can’t make lots and lots of calls to Table.Combine without running into a stack overflow. So if NumberOfTimes=100000 like so:

let

//define a table with one row and one column called x
MyTable = #table(type table [x = number], {{1}}),
//specify how many rows we want in our output table
NumberOfTimes = 100000,
//Use List.Accumulate to create a table with this number of rows
//by calling Table.Combine
CombineAllTables = List.Accumulate(
{1 .. NumberOfTimes},
null,
(state, current) => if current = 1 then MyTable else Table.Combine({state, MyTable})
)
in
CombineAllTables

…then, after a minute or so, you get the “Evaluation resulted in a stack overflow and cannot continue” error again.

Rewriting the query so you build up the list of tables first and only call Table.Combine once at the end avoids the problem and is much faster:

let
//define a table with one row and one column called x
MyTable = #table(type table [x = number], {{1}}),
//specify how many rows we want in our output table
NumberOfTimes = 100000,
//create a table with NumberOfTimes rows
CombineAllTables = Table.Combine(List.Repeat({MyTable}, NumberOfTimes))
in
CombineAllTables

It’s also possible to solve the problem by forcing eager evaluation inside List.Accumulate but this is extremely tricky: there’s an example of this on Gil Raviv’s blog here.

Power Query Nested Data Types In Excel

A year ago support for nested data types in Excel was announced on the Excel blog, but the announcement didn’t have much detail about what nested data types are and the docs are quite vague too. I was recently asked how to create a nested data type and while it turns out to be quite easy, I thought it would be good to write a short post showing how to do it.

Let’s say you have a Power Query query that returns details of the different types of fruit that you sell in your shop:

Let’s also say that the last three columns in this table (Grower, Address and Specialty) all relate to the farmer that supplies you with this fruit. Now you could create one Excel data type with all these columns in, but nested data types allow you to create data types inside data types, so in this case you can create a data type specifically for these three columns relating to the farmer and then nest it inside the main data type.

To do this, select these three columns in the Power Query Editor and click the Create Data Type button on the Transform tab in the ribbon:

Give the data type a name, in this case Farmer, in the Create Data Type dialog:

At this point you’ll have a query that returns a table where one of the columns, Farmer, contains a data type:

Finally, you then select all the columns in this table, including the Farmer column, and click the Create Data Type button again to create another new data type, this time called Product:

Here’s what you’ll see in the Power Query Editor once you’ve done this:

And here’s the result in the Power Query Editor:

Once you’ve loaded this query to the worksheet you can explore the nested data type via the card popup:

Or you can access the data in the nested data type using a formula. For example, in the animated gif above the cell A2 contains the data type for Apples, so the formula

=A2.Farmer.Address

…returns the address of the farmer who grows apples.

Alternatively, you can use the Excel FieldValue function to get the same result:

=FIELDVALUE(FIELDVALUE(A2, "Farmer"), "Address")

Anti Joins And Query Folding In Power Query In Excel And Power BI

Power Query allows you to merge (“join” in database terms) two tables together in a variety of different ways, including left and right anti joins. Unfortunately, as I found recently, anti joins don’t fold on SQL Server-related data sources, which can result in performance problems. Luckily there is a different way of doing anti joins that does fold.

Say you have two Power Query queries called Fruit1 and Fruit2 that get data from SQL Server tables containing the names of different varieties of fruit:

Now, let’s say you want to get a list of all the fruit varieties that are in Fruit1 and not in Fruit2. The obvious thing to do is to do a Merge and use the Left Anti option like so:

Here’s the M code, including an extra step to remove the join column that this creates:

let
    Source = Table.NestedJoin(Fruit1, {"Fruit"}, Fruit2, {"Fruit"}, "Fruit2", JoinKind.LeftAnti),
    #"Removed Other Columns" = Table.SelectColumns(Source,{"Fruit"})
in
    #"Removed Other Columns"

This gives you the correct result:

…but it doesn’t fold, so performance may be bad.

However, if you do a Merge and use the Left Outer option instead:

Then expand the join column (called Fruit2.Fruit here):

And then filter on that column so you only keep the rows where it contains the value null, and then remove that column, you get the same result:

Here’s the M:

let
    Source = Table.NestedJoin(Fruit1, {"Fruit"}, Fruit2, {"Fruit"}, "Fruit2", JoinKind.LeftOuter),
    #"Expanded Fruit2" = Table.ExpandTableColumn(Source, "Fruit2", {"Fruit"}, {"Fruit2.Fruit"}),
    #"Filtered Rows" = Table.SelectRows(#"Expanded Fruit2", each ([Fruit2.Fruit] = null)),
    #"Removed Other Columns" = Table.SelectColumns(#"Filtered Rows",{"Fruit"})
in
    #"Removed Other Columns"

This now does fold (meaning performance should be better) and gives you the following SQL:

select [_].[Fruit]
from 
(
    select [$Outer].[Fruit],
        [$Inner].[Fruit2]
    from [dbo].[Fruit1] as [$Outer]
    left outer join 
    (
        select [_].[Fruit] as [Fruit2]
        from [dbo].[Fruit2] as [_]
    ) as [$Inner] on ([$Outer].[Fruit] = [$Inner].[Fruit2] or [$Outer].[Fruit] is null and [$Inner].[Fruit2] is null)
) as [_]
where [_].[Fruit2] is null