Excel · M · Power BI · Power Query

Anti Joins And Query Folding In Power Query In Excel And Power BI

Power Query allows you to merge (“join” in database terms) two tables together in a variety of different ways, including left and right anti joins. Unfortunately, as I found recently, anti joins don’t fold on SQL Server-related data sources, which can result in performance problems. Luckily there is a different way of doing anti joins that does fold.

Say you have two Power Query queries called Fruit1 and Fruit2 that get data from SQL Server tables containing the names of different varieties of fruit:

Now, let’s say you want to get a list of all the fruit varieties that are in Fruit1 and not in Fruit2. The obvious thing to do is to do a Merge and use the Left Anti option like so:

Here’s the M code, including an extra step to remove the join column that this creates:

let
    Source = Table.NestedJoin(Fruit1, {"Fruit"}, Fruit2, {"Fruit"}, "Fruit2", JoinKind.LeftAnti),
    #"Removed Other Columns" = Table.SelectColumns(Source,{"Fruit"})
in
    #"Removed Other Columns"

This gives you the correct result:

…but it doesn’t fold, so performance may be bad.

However, if you do a Merge and use the Left Outer option instead:

Then expand the join column (called Fruit2.Fruit here):

And then filter on that column so you only keep the rows where it contains the value null, and then remove that column, you get the same result:

Here’s the M:

let
    Source = Table.NestedJoin(Fruit1, {"Fruit"}, Fruit2, {"Fruit"}, "Fruit2", JoinKind.LeftOuter),
    #"Expanded Fruit2" = Table.ExpandTableColumn(Source, "Fruit2", {"Fruit"}, {"Fruit2.Fruit"}),
    #"Filtered Rows" = Table.SelectRows(#"Expanded Fruit2", each ([Fruit2.Fruit] = null)),
    #"Removed Other Columns" = Table.SelectColumns(#"Filtered Rows",{"Fruit"})
in
    #"Removed Other Columns"

This now does fold (meaning performance should be better) and gives you the following SQL:

select [_].[Fruit]
from 
(
    select [$Outer].[Fruit],
        [$Inner].[Fruit2]
    from [dbo].[Fruit1] as [$Outer]
    left outer join 
    (
        select [_].[Fruit] as [Fruit2]
        from [dbo].[Fruit2] as [_]
    ) as [$Inner] on ([$Outer].[Fruit] = [$Inner].[Fruit2] or [$Outer].[Fruit] is null and [$Inner].[Fruit2] is null)
) as [_]
where [_].[Fruit2] is null
Excel · Performance Tuning · Power BI · Power Query

Excel Workbook Layout And The Performance Of Reading Data With Power Query In Power BI

Excel workbooks are one of the slowest data sources you can use with Power Query in Excel or Power BI. Reading small amounts of data from small workbooks is usually fast; reading large amounts of data from large workbooks can be very slow. But what about reading small amounts of data from large Excel workbooks? I did some tests and it turns out that performance can vary a lot depending on where your data is in the workbook and how that workbook is structured.

[Note: in this post I’ll be looking at .xlsx files, rather than other Excel formats like .xls and .xlsb; Excel files stored on a local disk and accessed via the File.Contents M function rather than stored in SharePoint or any other location; data read from Excel tables rather than direct from the worksheet; and Power Query in Power BI. Other scenarios may perform differently.]

Let’s see a simple example to illustrate what I found out. I created a new Excel workbook with one worksheet in and put a small table of data on it:

At this point the workbook’s size was 11KB. I then opened Power BI Desktop and created a Power Query query that read this table of data from the Excel workbook:

let
    Source = Excel.Workbook(File.Contents("C:\MyWorkbook.xlsx"), null, true),
    Sales_Table = Source{[Item="Sales",Kind="Table"]}[Data],
    #"Changed Type" = Table.TransformColumnTypes(Sales_Table,{{"Product", type text}, {"Sales", Int64.Type}})
in
    #"Changed Type"

Then I used this technique to measure how long it took to load the data from Excel into Power BI. Unsurprisingly, it was extremely fast: 63ms.

Then I added a new worksheet to the workbook, copied the same table onto it, added a large amount of random numbers underneath using the following Excel formula, and then copied and pasted the values returned by the formula over the output of the formula:

=RANDARRAY(9999,300)

Doing this meant the size of the workbook grew to 43MB. I then created a new Power Query query in Power BI Desktop, identical to the one above except that it connected to the new table. This time the query took 4918ms – almost 5 seconds.

Interestingly, even with the second worksheet with all the data on was added, the first query above (on the worksheet with no other data on) was still fast. I also tested refreshing a Power BI dataset that connected to two identical small tables on different worksheets in the same workbook, both with large amounts of other data on as in the second scenario above, and the performance of both queries was only slightly slower: it was clear two Power Query queries can read data from the same Excel workbook in parallel.

So: reading a small amount of data from a table on a worksheet with a large amount of other data on it is very slow.

What can we learn from this? Well, if you can influence the structure and layout of the Excel workbooks you are using as a data source – and that’s a big if, because in most cases you can’t – and you only need to read some of the data from them, you should put the tables of data you are using as a source on separate worksheets and not on the same worksheet as any other large ranges or tables of data.

It turns out that when the Power Query Excel connector reads data from an .xlsx file it can deserialise just some of the data in it rather than the whole thing, but what it can and can’t avoid deserialising depends a lot on the structure of the workbook and how the data is stored within the workbook .xlsx file. If you’re quick you can even see how much data is being read in Power BI Desktop in the refresh dialog:

You can also use Process Monitor, as I describe here, to see how much data is being read from any file used by Power Query.

Performance also depends on which application generated the .xlsx file (it’s not just Excel that creates .xlsx files, because other applications export data to .xlsx format without using Excel) or even which version of Excel saved the .xlsx file. This is because the same data can be stored in an .xlsx file in different ways, some of which may be more efficient to read than others. I found this blog post by Brendan Long on the .xlsx file format was very clear and it helped me understand how Power Query might go about reading data from an .xlsx file.

[Thanks to Curt Hagenlocher of the Power Query team for answering some of my questions relating to this post]

DAX · Excel · Fabric · Power BI · Python

Analyse Power BI Data In Excel With Python

In the Power BI/Fabric community everyone is excited about the recent release of Semantic Link: the ability to analyse Power BI data easily using Python in Fabric notebooks. Sandeep Pawar has an excellent blog post here explaining what this is and why it’s so cool. Meanwhile in the Excel community, everyone is excited about the new integration of Python into Excel. But can you analyse Power BI data in Excel using Python? Yes you can – so as my teenage daughter would say, it’s time for a crossover episode.

Let’s see a simple example. The main problem to solve is how to get data from Power BI into Excel in a format that Python in Excel can consume easily, ie a table rather than a PivotTable. The easiest way to do this is using the new Excel Connected Table feature, described here, which creates a live connection back to Power BI so when the data in the dataset changes the data in Excel is updated too. I have a Power BI dataset published to the Service that contains data from the UK government’s Land Registry Price Paid data which has details of all the property transactions in England and Wales; I found this in Excel and clicked Insert Table:

I then defined a query that found the number of property transactions and average price paid broken down by county:

This gave me a table, connected to the dataset in the Power BI Service using a DAX query, in my worksheet:

I renamed this table to “Sales”:

Unfortunately you can’t change the ugly column names without rewriting the DAX query behind the table, which makes life more difficult later on.

Then, on a new worksheet, I created a Python code cell using the PY function and entered the following Python code:

df=xl("Sales[#All]", headers=True)
s = plt.scatter(df.iloc[:,1], df.iloc[:,2], marker=11)
s.axes.xaxis.set_label_text("Count Of Sales")
s.axes.yaxis.set_label_text("Average Price Paid")

[I’ve only just started learning Python so please excuse any newbie bad practices/mistakes here! The main point is to visualise the data returned from Power BI]

This gave me a scatterplot with each county as a marker, the count of sales measure on the x axis and the average price paid measure on the y axis:

A few comments:

  • The xl function allows you to reference Excel cells, ranges and tables in your code; the reference to Sales[#All] gets the whole of the Sales table, including headers; adding headers=True means the table headers are recognised as such
  • Dealing with those ugly column names in Python is such a pain that I copped out and referenced the columns by position
  • After entering the code and committing it, you also need to tell Excel to treat the output as an Excel Value rather than a Python object to see the scatterplot; you also need to resize the cell

A second way of getting data into Excel from Power BI is to export the data from a published Power BI report. If you’re going to do that, you should export as a connected table so again the data stays connected to the source Power BI dataset.

There’s also a third , slightly different way of getting data from Power BI into Excel that is possible if you have Premium and which is a bit more complex but also more flexible: you can use Power Query, but maybe not in the way you would expect. The xl function can reference the output of a Power Query query even if that query is not loaded to a worksheet or the Excel Data Model – which I think is a nice touch and important if you’re working with larger data volumes.

To get data from Power BI into Excel using Power Query you need to use Power Query’s Analysis Services connector to connect to your workspace’s XMLA Endpoint. Go to the Data tab in Excel, click the Get Data button then From Database/From SQL Server Analysis Services Database (Import):

On the connection dialog the XMLA Endpoint goes into the Server box, the name of the dataset goes into the Database box and you can paste a DAX query into the MDX or DAX query box:

There are several benefits to using this approach:

  • You can use your own DAX query rather than have one generated for you
  • You can easily edit the DAX query after you have created the Power Query query
  • You can rename the query as well as all those ugly column names, making them easier to work with in Python – I named my query SalesByCounty and renamed my columns to County, CountOfSales and AveragePricePaid

I then closed the Power Query Editor without loading the output of the query anywhere.

You can read more about how to use Power Query queries in Python in Excel here.

Finally, here’s the modified version of the Python code to create the scatterplot shown above:

df=xl("SalesByCounty")
s = plt.scatter(df.CountOfSales, df.AveragePricePaid, marker=11)
s.axes.xaxis.set_label_text("Count Of Sales")
s.axes.yaxis.set_label_text("Average Price Paid")

Note how, in the first line, I can reference the Power Query query by name in the xl function and how, in the second line, renaming the columns in Power Query makes writing the Python code much easier.

Is this actually going to be useful to anyone? Well if Python in Excel is going to be used, it will be used by data analysts who love both Excel and Python – and who are also likely to use Power BI too. As Sandeep argues in the blog post about Semantic Link referenced above there are several reasons why these data analysts should use a Power BI dataset as a data source for their work rather than going back to the raw data: for example they can be sure they are using exactly the same data that is being used in their reports and they can use measures defined in the dataset rather than have to recreate the same calculations in their own code. While Semantic Link in Fabric is much more powerful than anything you can do in Excel with Power BI data, it’s only available in Fabric notebooks and this needs a Fabric or Premium capacity; this technique is available to anyone who has Python in Excel and works with Power BI Pro as well as Premium. So yes, in conclusion, I think there are some practical uses for this.

Errors · Excel · M · Power BI · Power Query

Understanding The “External table is not in the expected format” Error In Power Query In Power BI And Excel

Sometimes when you’re importing data from files using Power Query in either Power BI or Excel you may encounter the following error:

DataFormat.Error: External table is not in the expected format

What causes it? TL;DR it’s because you’re trying to load data from one type of file, probably Excel (I don’t think you can get this error with any other source but I’m not sure), and actually connecting to a different type of file.

Let’s see a simple example. Say you have a folder with two files: one is an Excel file called Date.xlsx and one is a CSV file called Date.csv.

Here’s the M code for a Power Query query that connects to the Excel file and reads the data from a table in it:

let
  Source     = Excel.Workbook(File.Contents("C:\MyFolder\Date.xlsx"), null, true), 
  Date_Table = Source{[Item = "Date", Kind = "Table"]}[Data]
in
  Date_Table

Now, if you change the file path in this query – and only the file path – to point at the CSV file instead like so:

let
  Source     = Excel.Workbook(File.Contents("C:\MyFolder\Date.csv"), null, true), 
  Date_Table = Source{[Item = "Date", Kind = "Table"]}[Data]
in
  Date_Table

…you will get the “external table is not in the expected format” error shown above. This is because your code is using the Excel.Workbook M function, which is used to import data from Excel workbooks, to connect to a file that is a CSV file and not an Excel workbook. The way to fix it is to use the appropriate function, in this case Csv.Document, to access the file like so:

let
  Source = Csv.Document(
    File.Contents("C:\MyFolder\Date.csv"), 
    [Delimiter = ",", Columns = 4, Encoding = 65001, QuoteStyle = QuoteStyle.None]
  ), 
  #"Promoted Headers" = Table.PromoteHeaders(Source, [PromoteAllScalars = true])
in
  #"Promoted Headers"

To be honest, if making this change is beyond your Power Query skills and you’re sure you’re trying to connect to the right file, you’re better off creating a completely new query rather than editing the query you already have.

Another common scenario where you might encounter this error is when you’re importing data from all the files in a folder and one of the files isn’t in the correct format. For example, let’s say you have a folder with three Excel files in and you use the Folder data source to import all the data from all three files:

Since all three files are Excel files the Folder option will work:

However, if you take a CSV file and drop it into the folder like so:

Then you’ll get the same error in Power Query:

Apart from deleting the CSV file you have another option to solve this problem in this case: filtering the folder so you only try to get data from the .xlsx files and no other file type. To do this, click on the step that is called “Source”. When you do this you’ll see that the step returns a table containing all the files in the folder you’re pointing at:

You’ll see that the table in this step contains a column called Extension which contains the file extension for each file. If you filter this table – which will insert a new step at this point in the query, which is ok – by clicking on the down arrow in the Extension column, delselecting the (Select All) option and selecting “.xlsx” so the table only contains .xlsx files then you can avoid this problem:

If, as in this example, the rogue file happens to be the first file in the folder and you’ve selected that first file to be your “sample” file when setting up the import, then you’ll also need to go to the query called Sample File in the Queries pane and make exactly the same change there (ie click on the Source step and filter to remove any non .xlsx files).

Excel · Power BI · Power Query · SQL Server

Cross Database Query Folding For SQL Server Sources In Power Query In Power BI And Excel

A few months ago a new option was added to the Sql.Database and Sql.Databases functions in Power Query in Power BI and Excel which allows Power Query queries that combine data from different SQL Server databases to fold. Here’s a simple example showing how to use it.

On my local PC I have SQL Server installed and the Adventure Works DW 2017 and Contoso Retail DW sample databases:

Both of these databases have date dimension tables called DimDate. Let’s say you want to create a Power Query query that merges these two tables.

Here’s the M code for a Power Query query called DimDate AW to get just the DateKey and CalendarYear columns from the DimDate table of the Adventure Works DW 2017 database:

let
    Source = Sql.Database("localhost", "AdventureWorksDW2017"),
    dbo_DimDate = Source{[Schema="dbo",Item="DimDate"]}[Data],
    #"Removed Other Columns" = Table.SelectColumns(dbo_DimDate,{"DateKey", "CalendarYear"})
in
    #"Removed Other Columns"

Here’s the M code for a Power Query query called DimDate Contoso to get just the Datekey and CalendarYear columns from the DimDate table in the ContosoRetailDW database:

let
    Source = Sql.Database("localhost", "ContosoRetailDW"),
    dbo_DimDate = Source{[Schema="dbo",Item="DimDate"]}[Data],
    #"Removed Other Columns" = Table.SelectColumns(dbo_DimDate,{"Datekey", "CalendarYear"})
in
    #"Removed Other Columns"

Both of these Power Query queries fold. However if you create a third query to merge these two queries (ie do the equivalent of a SQL join between them) on the CalendarYear columns like so:

let
  Source = Table.NestedJoin(
    #"DimDate Contoso",
    {"CalendarYear"},
    #"DimDate AW",
    {"CalendarYear"},
    "DimDate AW",
    JoinKind.LeftOuter
  ),
  #"Expanded DimDate AW" = Table.ExpandTableColumn(
    Source,
    "DimDate AW",
    {"DateKey", "CalendarYear"},
    {"DimDate AW.DateKey", "DimDate AW.CalendarYear"}
  )
in
  #"Expanded DimDate AW"

…this query does not fold, because it combines data from two different SQL Server databases.

However if you edit the Sql.Database function in the Source step of both of the first two queries above to set the new EnableCrossDatabaseFolding option to true, like so:

    Source = 
    Sql.Database(
        "localhost", 
        "ContosoRetailDW", 
        [EnableCrossDatabaseFolding=true]
        ),

…then the query that merges these two queries does fold:

Dataflows · Excel · Fabric · Power Query

Power Query Templates In Excel And Fabric

Earlier this year a new feature was added to Power Query in Excel and Dataflows that very few people noticed: the abilty to export all the queries in the Power Query Editor to a template file. The reason few people noticed it was that, when the feature released, the only place you could import a template file was in a Power Platform dataflow. The documentation for exporting from Excel and importing to a Power Platform dataflow is very detailed and can be found here:

https://learn.microsoft.com/en-us/power-query/new-dataflow-from-template

With the release of Fabric you can now import templates into Gen2 Dataflows. This means you can export a template from a Gen1 dataflow and import it into a Gen2 dataflow, which is documented here:

https://learn.microsoft.com/en-us/fabric/data-factory/move-dataflow-gen1-to-dataflow-gen2

Joining the dots, this also means you can now export a template from Power Query in Excel and import it to a Gen2 dataflow. This makes it easy to develop on your PC before pushing your work up to Fabric. Here’s a quick walkthrough of how to do this.

In Excel Power Query let’s say you have one or more queries in the Power Query Editor:

If you then go to the File menu in the top left corner of the screen you’ll see the option to export to a template:

If you click this menu item, you’ll see a dialog where you can enter a name and description for the template:

Click Export and you can save the template to a .pqt file.

If you then create a new Gen2 Dataflow in Fabric then, in the middle of the screen, you’ll see the option to import from a template:

[Notice also the Export Template option in the top right corner]

If you import the .pqt file created in Excel the queries in it will get created in the dataflow. You may need to re-enter credentials:

Once you’ve done that, the queries run:

Template files are the way the cool kids share their Power Query queries these days and no kid is cooler than my colleague Alex Powers who, I saw on Twitter recently, has created a template file that scrapes all the ideas from the Fabric Ideas site:

[If you do import this template close to the time of writing this post you may run into this bug from the list of known issues for Fabric Data Factory and Dataflows]

Hopefully we’ll start to see more and more useful Power Query queries and functions shared as template files!

Cube Formulas · Excel · Power BI

Returning Tables From A Single Excel Cube Formula Connected To Power BI

After my post on tuning Excel cube functions last week there was some (justified) moaning on Twitter about their performance. This got me thinking about other things you could do with cube functions that might improve performance and I had a crazy idea…

Late last year two new DAX functions were introduced which didn’t get much attention: ToCSV and ToJSON. They allow you to evaluate a table expression and return the result as text in either CSV or JSON format, for example from a measure. They are intended to make debugging easier. What I realised is that you can use ToCSV to return a table of data in CSV format in a single Excel cube formula in a cell in your worksheet and then use dynamic arrays to turn that back into a table in Excel.

Using the dataset from last week’s post, with the following tables:

…I created the following measure:

MyTable =
TOCSV (
    SUMMARIZE (
        'Country',
        Country[Country],
        "Sales Amount", [Sales Amount],
        "Target Amount", [Target Amount]
    )
)

You can see it return a text value in CSV format from the following Power BI report:

I then created a connection to the published dataset from Excel and using a single formula in cell B3 and two slicers connected to the Product and Continent fields was able to create the following report:

Here’s the formula in B3:

=
LET(
CSV,
CUBEVALUE("CubeFunctionsOptimisationDataset",
CUBEMEMBER("CubeFunctionsOptimisationDataset","[Measures].[MyTable]"),
Slicer_Product,Slicer_Continent),
SplitToCells,
TEXTSPLIT(CSV,",", CHAR(10)),
HeaderRow,
TAKE(SplitToCells,1),
DataRows,
DROP(SplitToCells,1),
RemoveSquareBracketsFromHeaders,
TEXTBEFORE(TEXTAFTER(HeaderRow, "["),"]"),
Recombine,
VSTACK(RemoveSquareBracketsFromHeaders, DataRows),
Recombine)

Note: you also need to click the Wrap Text option on the ribbon for this cell for this to work properly.

There’s a lot going on here but the Excel Let function makes it easy to break the formula up and understand it. Here’s what each step in the formula does:

  • CSV uses the CubeMember and CubeValue functions to get the result of the MyTable measure for the current selection in both slicers.
  • SplitToCells takes the single text value returned by the previous step and uses the TextSplit function to turn it back into an array
  • HeaderRow finds the first row in this array, which contains the column headers using the Take function
  • DataRows finds all other rows, which contain the data in the table using the Drop function
  • RemoveSquareBracketsFromHeaders takes the contents of HeaderRow and uses the TextBefore and TextAfter functions to find the names of each column and measure, which the ToCSV function returns between square brackets (for example it takes ‘Country'[Country] and returns Country, and takes [Sales Amount] and returns Sales Amount)
  • Recombine then takes the new header row with the cleaned names and appends the array from DataRows underneath it, using the VStack function

This formula is a great candidate for use with the Lambda function so it can be easily reused.

There are a number of reasons this technique is interesting, even if it’s slightly hacky. First of all since the table is returned from a DAX expression that you write, and it’s returned by a single query, it could result in faster performance. Second, instead of just one cell returning one table, you can combine it with CubeMember functions to return multiple tables. In this example C3 and G3 contain CubeMember functions that the tables immediately below them refer to:

I haven’t tested this with very large tables. Since there is a limit on the amount of text that a measure can return then there is a limit on the size of the array that can be displayed; I’m also not sure what the performance overhead of the Excel formula to convert the text to the array is.

Analysis Services · Azure Analysis Services · Cube Formulas · Excel · Power BI · PowerPivot

Reducing The Number Of Queries Generated By Excel Cube Function Reports On Power BI, Analysis Services And Power Pivot

I am a big fan of using Excel cube functions for reporting on Power BI datasets, Analysis Services and Power Pivot: they allow for a lot more layout flexibility than PivotTables when building reports in Excel. However, they do have a reputation for poor performance and part of the reason for this is their chattiness. While Excel does not generate one query for each cell containing a cube function, it is true that a report using cube functions will generate a lot more MDX queries against your Power BI dataset/Analysis Services cube/Power Pivot mode than the equivalent PivotTable. As a result, one way to improve the performance of reports that use Excel cube functions is to optimise them to reduce the number of MDX queries generated.

To understand how to do this you first need to understand how Excel generates the MDX queries needed by cube functions. First of all it looks at the cells containing CubeValue functions on a worksheet and groups them together by the granularity of the data they are requesting; then, for each granularity, it runs one or more MDX queries to get the data it needs, where each query gets data for up to 500 cells. There’s not much you can do to control this behaviour, but in situations where you have multiple fact tables with different granularities there is a trick you can play to reduce the number of queries.

Let’s take a simple example. Consider the following source data:

…loaded into a Power BI dataset with two fact tables, Sales and Targets, and two dimension tables, Product and Country:

Now consider the following report that uses two groups of cube formulas to get the Sales Amount for Apples in the UK and the Target Amount for the UK:

Here are the formulas for these cells:

This worksheet generates two MDX queries for the two different granularities (plus one other MDX query that gets some metadata). The first gets the Sales Amount for Apples in the UK and populates the CubeValue function in cell D3. This query consists of a single MDX tuple whose granularity is Country, Measure and Product:

SELECT 
{([Country].[Country].&[UK],[Measures].[Sales Amount],[Product].[Product].&[Apples])} 
ON 0 
FROM [Model] 
CELL PROPERTIES VALUE, FORMAT_STRING, LANGUAGE, BACK_COLOR, FORE_COLOR, FONT_FLAGS

The second gets the Target Amount for the UK and populates the CubeValue function in cell D6. It consists of a single MDX tuple whose granularity is Country and Measure:

SELECT 
{([Country].[Country].&[UK],[Measures].[Target Amount])} 
ON 0 
FROM [Model] 
CELL PROPERTIES VALUE, FORMAT_STRING, LANGUAGE, BACK_COLOR, FORE_COLOR, FONT_FLAGS

It is possible to get the same data in a single MDX query and the key to doing so is to make the granularity of the two requests the same. One way of doing this is to edit the contents of cell D6, which at this point contains the following formula to get the Target Amount (in D5) for the UK (in C6) using the CubeValue function:

=CUBEVALUE("CubeFunctionsOptimisationDataset", C6,D5)

If you add an extra reference to cell C3, which contains the CubeMember function returning the Product Apples, like so:

=CUBEVALUE("CubeFunctionsOptimisationDataset", C6,D5, C3)

Then this results in exactly the same data being returned to Excel and exactly the same data being displayed in the worksheet, but with a single MDX query being generated:

SELECT 
{([Country].[Country].&[UK],[Measures].[Target Amount],[Product].[Product].&[Apples]),
([Country].[Country].&[UK],[Measures].[Sales Amount],[Product].[Product].&[Apples])} 
ON 0 
FROM [Model] 
CELL PROPERTIES VALUE, FORMAT_STRING, LANGUAGE, BACK_COLOR, FORE_COLOR, FONT_FLAGS

As you can see, this query now consists of two tuples whose granularity is Country, Measure and Product. The reason this works is because adding the reference to the Product Apples makes no difference to the value returned by the Target Amount measure – which has no relationship with the Product dimension table – but it is enough to fool Excel into thinking that the CubeValue function in cell D6 is making a request at the same granularity as the CubeValue function in cell D3. It is necessary to add a reference to an individual Product, such as Apples, rather than the All Member on Product though.

Another, perhaps more complicated, way of achieving the same result is to leave the formula in cell D6 the same but change the formula in C6 from:

=CUBEMEMBER("CubeFunctionsOptimisationDataset", "[Country].[Country].[All].[UK]")

…to use the tuple form of CubeMember to get the combination of Apples and UK:

=CUBEMEMBER("CubeFunctionsOptimisationDataset", {"[Product].[Product].[All].[Apples]","[Country].[Country].[All].[UK]"})

Note that when you use this tuple form of CubeMember, putting Apples first in the tuple and Country second results in only the Country name being displayed in the cell, so again the data displayed in Excel is exactly the same.

Reducing the number of MDX queries in this way can improve performance for two reasons: it reduces the number of round trips to the dataset and it gives the Analysis Services engine (in Power BI, Analysis Services or Power Pivot) the chance to get the data needed in a more optimal way than might be possible with multiple queries. I don’t think the technique in this post will make a massive difference to performance but if you do try this, I’m curious to see how much of an improvement you see.

Excel · M · Power BI · Power Query

Removing Diacritics From Text In Power Query

I faced an interesting challenge recently: I had to use Power Query to remove all non-ASCII characters from some text but maintain the readability of that text. Let’s take the French phrase “un garçon très âgé à Noël” (which I know doesn’t make much sense but is great for testing) as an example. If you remove everything apart from the basic latin alphabet characters and spaces using Text.Select as I blogged here:

Text.Select(
"un garçon très âgé à Noël", 
List.Combine(
{{"A".."Z"},{"a".."z"},{" "}}
)
)

[Rick de Groot has a very detailed post on Text.Select here that’s worth reading]

…then the output is “un garon trs g Nol”. As you can see, removing all the characters leads to unreadable text. Instead, what you have to do is find all the letters with diacritics (accents and other glyphs that can be added to characters) and remove the diacritics. Doing this may be ungrammatical and make it harder to understand the meaning of the text but in most cases the text will still be readable.

The bad news is that there is no straightforward way to do this in Power Query, and indeed there is no straightforward way to do this at all because there are no hard-and-fast rules about what to replace a letter with a diacritic with: should “ö” become “o” or “oe” for example? My first thought was to create a big lookup table with all the rules of what to replace each character with in, similar to the approach taken here for solving this problem in Google Sheets. Building a comprehensive lookup table would be gigantic task though.

Luckily there is another solution (thanks to Curt Hagenlocher for finding this one for me) that is described in this thread on StackExchange: converting the text to binary using the Greek (ISO) code page and converting it back to text again using the Text.FromBinary and Text.ToBinary M functions. I’m not sure I understand why it works but it seems to work well enough for my purpose. Here’s the M code to do this:

Text.FromBinary(
Text.ToBinary(
"un garçon très âgé à Noël", 
28597
)
)

The output of this is “un garcon tres age a Noel”, which is exactly what we want. Of course it’s not perfect and there are cases where characters can’t be converted. If you take the text “Malus × zumi” (it’s a type of crabapple apparently), then:

Text.FromBinary(
Text.ToBinary(
"Malus × zumi", 
28597
)
)

returns the text “Malus ? zumi” – the “×” has been replaced by “?”. As a result you’ll probably also want to replace any question marks with a space and then remove any excess spaces; I’m going to ignore the fact that doing this might also remove any question marks that were in the original text.

Here’s an M custom function that does all this and also solves my original problem of only returning ASCII characters:

(inputText as text) as text =>
  let
    ReplaceDiacritics = Text.FromBinary(Text.ToBinary(inputText, 28597), TextEncoding.Ascii), 
    RemoveQuestionMarks = Text.Replace(ReplaceDiacritics, "?", " "), 
    RemoveExcessSpaces = Text.Combine(
      List.RemoveItems(Text.Split(Text.Trim(RemoveQuestionMarks), " "), {""}), 
      " "
    )
  in
    RemoveExcessSpaces
Excel · M · Power BI · Power Query

Understanding The “A cyclic reference was encountered during evaluation” Error In Power Query In Power BI Or Excel

When working in the Power Query Editor in Power BI or Excel, especially when you’re editing M code, you may run into the following error:

Expression.Error: A cyclic reference was encountered during evaluation.

What does this message mean and what causes this error? In this blog post I’ll explain.

The first step to understanding what’s going on here is to understand how Power Query evaluates the steps in a query. I wrote a blog post on let expressions in M some time ago, which you can read here, and which goes into a lot of detail on this subject but the basics are quite easy to understand. Conside the following M query:

let
    Step1 = 1,
    Step2 = Step1 + 5,
    Step3 = Step2*2
in
    Step3

If you paste this into a new blank query in the Power Query Editor you’ll see it results in three steps in the Applied Steps on the right hand side of the screen:

The query itself, as you would expect, returns the value 12:

The important thing is to know that Power Query does not evaluate the steps in the order that they appear here – so it does not evaluate Step1 first, then Step2 and then Step3. What it does is evaluate Step3 first, which is needed for the output of the query, and then because it needs the value of Step2 to do this it evaluates Step2 next, and then because it needs the value of Step1 to evaluate Step2 it evaluates Step1 last. You can visualise the chain of dependencies between these steps like so:

Now consider the following M query:

let
    Step1 = Step3,
    Step2 = Step1 + 5,
    Step3 = Step2*2
in
    Step3

The change is that Step1 is now set to the value of Step3 instead of the value 1. This query gives you the “A cyclic reference was encountered during evaluation” error and you can probably see why: the query returns the value of Step3, which needs the value of Step2, which needs the value of Step1, which needs the value of Step3 – so there’s no way of calculating the output of the query. There’s a loop in the dependencies between the steps:

This is the cyclic reference (sometimes known as a “circular reference”) from the error message. As you can imagine it’s very easy to create cyclic references when you’re editing M code manually and changing step names.

It’s also possible to run into this error when using referenced queries, that’s to say when one Power Query query uses the output of another Power Query query as its data source. For example, let’s say you had a query called Query1 with the following M code:

let
    Source = Query2
in
    Source

…and a query called Query2 with the following M code:

let
    Source = Query1
in
    Source

Once again there is a cyclic reference: Query1 refers to the output of Query2 and Query2 refers to the output of Query1. The error message returned for Query2 is:

An error occurred in the ‘Query1’ query. Expression.Error: A cyclic reference was encountered during evaluation.

An extra problem in this case is that if you try to delete just one of these queries, you’ll get the following error message:

The query “Query1” cannot be deleted because it’s being referenced by another query: “Query2”.

You have to select both queries and then delete them, or of course edit one of the queries to break the cyclic reference.

One last thing to mention: recursion is possible in M, meaning that you can write a custom function that calls itself (Miguel Escobar has a detailed post on this here) but it can be very hard to get it working and it’s not always the most efficient way to solve a problem.