The M Behind The New Power Query Data Types In Excel

The big news this week – at least for me – was the release of the new Power Query data types to the Excel insiders channel. You can read all about it here:

https://insider.office.com/en-us/blog/power-query-data-types-in-excel

They’re the latest manifestation of Excel linked data types; cool things are also happening with them and Power BI featured tables too.

The announcement blog post explains pretty much everything you can do right now with Power Query data types but I was curious about the M code that is used to create them. Here’s an example query that takes this source table:

…and creates this Power Query data type:

let
  Source = #table(
      type table[
        Fruit = text, 
        Colour = text, 
        Sales = number
      ], 
      {
        {"Apples", "Green", 10}, 
        {"Lemons", "Yellow", 20}, 
        {"Strawberries", "Red", 30}
      }
    ),
  #"Created data type"
    = Table.CombineColumnsToRecord(
        Source, 
        "Data type", 
        {"Fruit", "Colour", "Sales"}, 
        [DisplayNameColumn = "Fruit", TypeName
          = "Excel.DataType"]
      )
in
  #"Created data type"

The magic happens with the #”Created data type” step and the Table.CombineColumnsToRecord function; so Power Query data types are basically columns that contain record values with (I guess, I need to check) some extra metadata.

Power Query Geography And Geometry Functions In Power BI And Excel

In the August 2020 release of Power BI Desktop a couple of new Power Query functions were added: Geography.FromWellKnownText, Geography.ToWellKnownText, GeographyPoint.From, Geometry.FromWellKnownText, Geometry.ToWellKnownText and GeometryPoint.From. These functions (which are coming soon to Power Query in Excel too), make it easier to work with geographic and geometric data in the Well Known Text format. You can have all kinds of fun with these functions if you have a visual (like the Icon Map custom visual) that can display Well Known Text data, but I’ll leave that kind of thing for future blog posts. In this post I’ll explain how the basics of how the functions actually work.

Let’s start with points. The Geography.FromWellKnownText and Geometry.FromWellKnownText functions convert Well Known Text values into records, which makes it simple to do things like extract latitude and longitude values without having to do complex parsing of the text itself. For example:

Geometry.FromWellKnownText("POINT (0.1 0.2)")

Returns a record that looks like this:

Whereas the following:

Geography.FromWellKnownText("POINT (0.1 0.2)")

…returns a similar record but one with fields for latitude and longitude rather than x and y:

Now you know what the record format of a point looks like it’s easy to go in the other direction and convert a record into Well Known Text format. For example the M expression:

Geography.ToWellKnownText([Kind="POINT", Longitude=0.1, Latitude=0.3])

returns the text value

POINT(0.1 0.3)

You can also generate a point by passing a longitude and latitude (and other optional parameters) to GeographyPoint.From and GeometryPoint.From. For example:

GeographyPoint.From(0.1, 0.3)

…also returns a POINT record:

Points can then be used to build more complex objects, which in turn can be combined into more complex objects still. For example the following M query takes three points and creates a LINESTRING object that joins those three points into a line:

let
    Point1 = GeographyPoint.From(0,0),
    Point2 = GeographyPoint.From(1,1),
    Point3 = GeographyPoint.From(0,1),
    PointList = {Point1,Point2,Point3},
    LineRecord = [Kind="LINESTRING", Points=PointList],
    WKTLineString = Geography.ToWellKnownText(LineRecord)
in
    WKTLineString

The output is this:

LINESTRING(0 0, 1 1, 0 1)

Similarly, multiple LINESTRING objects can be combined into a MULTILINESTRING. For example:

let
    Point1 = GeographyPoint.From(0,0),
    Point2 = GeographyPoint.From(1,1),
    PointList1 = {Point1,Point2},
    Point3 = GeographyPoint.From(0,1),
    Point4 = GeographyPoint.From(1,0),
    PointList2 = {Point3, Point4},
    LineRecord1 = [Kind="LINESTRING", Points=PointList1],
    LineRecord2 = [Kind="LINESTRING", Points=PointList2],
    LineRecordList = {LineRecord1,LineRecord2},
    MultiLineRecord = [Kind="MULTILINESTRING", Components=LineRecordList],
    WKTMultiLineString = Geography.ToWellKnownText(MultiLineRecord)
in
    WKTMultiLineString

…returns:

MULTILINESTRING((0 0, 1 1), (0 1, 1 0))

Finally, the most useful application for this is to extract latitudes and longitudes or x and y co-ordinates from a column of values in WKT format, something like this:

All you need to do to extract latitude and longitude values from these points is add a custom column as a step with the expression

Geography.FromWellKnownText([MyPoints])

The result is a column of records which can be expanded to get latitude and longitude values:

Adding Tooltips To Column Headers In The Power Query Editor In Power BI And Excel

Here’s something that will excite all you M nerds out there! Did you know you can make tooltips appear when you hover over column headers in a table in the Power Query Editor in either Power BI Desktop or Excel?

It’s all done with M metadata, and here’s an example query:


let
source =
#table({"firstname", "lastname"}, {{"matt", "masson"}}),
tableType =
type table[firstname = Text.Type, lastname = Text.Type]
meta [
Documentation.FieldDescription =
[firstname = "Given Name", lastname = "Family Name"]
],
replaceType = Value.ReplaceType(source, tableType)
in
replaceType

Here’s what you see when your mouse hovers over the firstname column in the Power Query Editor:

Capture1

…and here’s what you see when your mouse hovers over the lastname column:

Capture2

How does this work? Here’s what each of the steps do:

  • The source step creates a simple table with two columns called firstname and lastname using #table (see here for more details on that).
  • The tabletype step declares a new table type with two columns (the same two columns in the table from the previous step) and then adds a metadata record to this type. In that record the Documentation.FieldDescription field contains the text values that will appear as tooltips when you hover over each column.
  • The replacetype step replaces the type of the table returned by source with the type declared in tabletype.

The Power Query Editor UI then looks for a Documentation.FieldDescription field in any metadata associated with a table and displays the values in that field when you hover over the appropriate column.

Note that if you add any steps to your query after this that change the table type (for example that add or remove columns), the metadata is removed ☹. That said I still feel like this might be a useful feature for anyone building a custom connector, for example.

[Thanks to Matt Masson for telling me how all this works]

 

 

Video: Power BI Data Privacy Settings Deep Dive

Over the past year or so I’ve been delivering a presentation on the Power Query engine’s data privacy settings at various conferences and Power BI user groups, in an attempt to try to pull together all the knowledge I have on this complex topic. Luckily, when I presented this session at the London Power BI User Group recently, they recorded it and posted it on YouTube here:

If you’re struggling with data privacy errors like:

Formula.Firewall: Query ‘Query1’ (step ‘xyz’) is accessing data sources that have privacy levels which cannot be used together. Please rebuild this data combination.

or

Formula.Firewall: Query ‘Query1’ (step ‘xyz’) references other queries or steps, so it may not directly access a data source. Please rebuild this data combination.

…in either the Power Query Editor in Power BI or Power Query/Get&Transform in Excel, then I hope this video will help you understand why you’re getting these errors and what you can do to avoid them.

A lot of what I show in this session draws on other material, such as:

  • My five-part series of posts on the Power Query data privacy settings that starts here
  • My post on how credentials and data privacy settings are stored for dynamic data sources here
  • My post here on the performance overhead of applying data privacy checks
  • Ehren von Lehe’s detailed paper on (available here) on how the engine partitions data sources while applying data privacy checks. One of the things I show in the video is that it’s now possible to see these partitions using Power Query Query Diagnostics (see here for some details – although I should probably devote a whole blog post to this in the future)

Lastly, one minor correction to something I said in the video: at the 44:32 mark I do a demo that shows how you can use M functions to avoid a Formula Firewall error. While this trick works in Power BI Desktop and Excel, it results in a dataset that can’t be refreshed in the Power BI Service unfortunately.

 

Power Query Formatter

If you’ve ever used DAX Formatter to format your DAX code and wondered why there isn’t an equivalent for Power Query/M, then wonder no more: the nice people at Power Pivot Insights in Germany (read their German-language blog here) have built one. You can find it here:

https://powerqueryformatter.com/

PQ Formatter

They also have an API you can use too.

Bonus information: did you know that Microsoft also has an open source parser for M, available here?

Understanding The “The key didn’t match any rows in the table” Error In Power Query In Power BI Or Excel

One of the most common errors you’ll see when working with Power Query in Power BI or Excel is this:

Expression.Error: The key didn’t match any rows in the table

KeyErrorScreenshot

It can occur with almost any data source and in a wide variety of different circumstances, and for new users of Power Query it can be very confusing. In this post I’ll explain what the error message means and when you’re likely to encounter it using a simple example.

TL;DR You’re probably getting this error because your Power Query query is trying to connect to a table or worksheet or something in your data source that has been deleted or been renamed.

Say you have an Excel workbook with three worksheets in called Sheet1, Sheet2 and Sheet3:

WorksheetsBefore

On Sheet1 there’s some data you want to load into Power BI or Excel using Power Query:

Data

It’s very easy to do this: after you click on the Excel data source and select your Excel workbook, you see the Navigator window with the contents of the workbook and select Sheet1:

Nav

…click OK and your query is ready:

PQQuery

What actually happens in this query though? If you click on each of the four steps in the Applied Steps pane on the right-hand side of the screen, you’ll see what each step does. In particular, notice that the first step in the query (called Source) returns a table with one row for each worksheet, table and named range in the workbook:

SourceStep

This is something that happens with many types of data source: after you have connected you have to choose what data inside the data source you want to connect to in the first step, and the list of things you can connect to is returned in a table. This table lists the same things displayed in the Navigator dialog shown above. In the same way, if you connect to a SQL Server database this first step returns a table with all the tables and views in the database; if you connect to an OData feed the first step shows all the resources available through the feed.

The other thing to notice in the table in the screenshot is the Data column, which contains nested table values. If you were to click on one of these nested tables you would navigate to the data in the worksheet named on that row and that is in fact what the second step in the query does.

Here’s the M code for the whole query:


let
Source =
Excel.Workbook(
File.Contents(
"C:\MyExcelFile.xlsx"),
null,
true),
Sheet1_Sheet =
Source{[Item="Sheet1",Kind="Sheet"]}[Data],
#"Promoted Headers" =
Table.PromoteHeaders(
Sheet1_Sheet,
[PromoteAllScalars=true]
),
#"Changed Type" =
Table.TransformColumnTypes(
#"Promoted Headers",
{{"Month", type text},
{"Sales", Int64.Type}}
)
in
#"Changed Type"

If you want to understand how the code in the second step of this query gets the contents of the cell containing the nested table, I have a blog post here that goes into a lot more detail. Basically the second step (called Sheet1_Sheet in the code, but shown as Navigation in the Applied Steps pane) returns the nested table in the Data column from the row in the table where the Item column contains the value “Sheet1” and the Kind column contains the value “Sheet”. It’s the contents of the Item and Kind columns that are used to identify the row in the table that contains the data you want to see – in database terminology these columns are the key columns on the table.

What happens if you change the name of the worksheet that contains the data you want, from “Sheet1” to “Hello”?

WorksheetsAfter

The result is that the first step of the query now returns a table that looks like this:

AfterNav

Since there isn’t a row where the Item column contains “Sheet1” any more, the second step can no longer find the row it’s looking for – the key value it’s looking for no longer exists in the table – which is why you see the error message you do:

KeyErrorScreenshot

There are two ways to fix the error:

  1. Go to the Excel workbook and change the name of the sheet called “Hello” back to “Sheet1” or
  2. In either the Advanced Editor or the Formula Bar edit the M code and replace the reference to “Sheet1” with “Hello” in the second step of the query:
    FormulaBar

In summary, when you’re working with data sources in Power Query you need to be aware that if something in your data source is renamed or deleted then your query will break. You can of course write some clever M code to deal with situations like this but my recommendation is to try to fix the problem in the data source and not in your query.

Excel Dynamic Arrays And The CubeSet Function

Among all the exciting announcements made at Ignite last week, one you may have missed (even in all the Excel-related announcements here) was that dynamic arrays have finally reached GA. Ever since they were announced I’ve been interested in how they can be used with Excel cube functions, which allow you to get data from Analysis Services, Power BI and the Excel Data Model/Power Pivot into cells on the worksheet, and I’ve blogged about this once already. Even though right now the CubeValue function can’t be used with dynamic arrays – alas – there are still some interesting ways dynamic arrays and cube functions can be used together and in this post I’ll show you another one I’ve found.

Say you have the following set of tables in the Excel Data Model:

image

And the following measure that counts the number of customers who bought something:

Distinct Customers:=DISTINCTCOUNT('Internet Sales'[CustomerKey])

Let’s also say you have on a worksheet the following table of data showing the number of distinct customers broken down by product, created using the CubeMember and CubeValue functions:

image

Here are the formulas for this table:

image

So far, all very straightforward. Now let’s say you want to use a dynamic array to filter the rows in this table so you only see the products that have more than a certain number of distinct customers, and you want to see a grand total for this filtered list of customers, like so:

ExcelDynamicArray

Here are the formulas to achieve this:

image

To explain what’s going on here:

  • Cell C2 contains the text “ThisWorkbookDataModel”, the name of the connection to the Excel Data Model. Doing this makes the other Excel formulas here much easier to read!
  • Cell F2 – which I’ve turned into a named range called FilterThreshold – contains the number of customers which a product must exceed to be shown.
  • Cell E5 contains the dynamic array Filter function that filters the rows shown in the first two screenshots above and only returns the products that exceed the threshold. The formula is:
    FILTER(B5:B134,C5:C134>FilterThreshold)
  • Cell E6 does something similar to E5 but returns the number of distinct customers for each product. If all I wanted was the products and the number of distinct customers I wouldn’t have needed two separate formulas, I could have used one, but I need to add the grand total too…
  • Cell E4 contains the formula:
    CUBESET($C$2, E5#, “Total”)
    This creates a set from the spill reference E5#, which contains the cells containing CubeMember formulas returned by the Filter function. For some reason CubeSet returns an error if you try to use a dynamic array formula in its second parameter but using a spill reference works ok; this is why I had to use separate formulas in E5 and E6.
  • Finally, cell F4 contains the formula that returns the total distinct count for all products returned in the filtered list using CubeValue and the set created in cell E4:
    CUBEVALUE($C$2, “[Measures].[Distinct Customers]”, E4)
    Since this is a distinct count the only way to get the correct value here is using CubeSet, CubeValue and the DAX measure – there would be no way to calculate the correct value using Excel formulas. The same could be said of almost any other DAX measure that wasn’t a simple aggregation.

You can download the sample workbook for this post here.

Excel PivotTable Query Performance With Analysis Services, Power Pivot And Power BI (Revisited)

Back in 2016 I wrote the following blog post about changes to the way Excel 365 generated MDX queries for PivotTables connected to Analysis Services, Power Pivot/the Excel Data Model and Power BI datasets:

https://blog.crossjoin.co.uk/2016/07/08/excel-2016-pivottable-mdx-changes-lead-to-big-query-performance-gains/

I know it sounds boring and not something you need to worry about but trust me, this is important – these changes solved the vast majority of Excel PivotTable performance problems that I encountered when I was a consultant so you should read the above post before continuing.

Unfortunately, earlier this year these changes had to be partially rolled back because in some rare cases the queries generated returned incorrect results; this means that you may find that values for subtotals and grand totals are again being returned even when they aren’t being displayed. The good news is that you should still be able to get the improved performance with a few minor tweaks.

Using the example from the previous post, a PivotTable connected to SSAS MD and the Adventure Works cube (the queries generated for SSAS Tabular, Power Pivot and Power BI may be slightly different and slightly better but the basic problem is the same), if you build the following in Excel:

image

image

…even though the subtotals in the PivotTable are not displayed in Excel, the MDX query generated by Excel not only returns them but also returns others that are not needed at all – in fact 36 rows (although the query returns them as columns) of data rather than the 13 rows that are displayed. Here’s a screenshot of the results returned when you run the MDX query in SQL Server Management Studio:

image

image

Any time you see a row or column containing an All Member (in this case All Customers or All Products) you know you are looking at a subtotal or grand total.

[NB The easiest way to get the MDX query generated by your PivotTable is to install the OLAP PivotTable Extensions add-in https://olappivottableextensions.github.io/]

In this case, to get an efficient query, you need to explicitly turn off subtotals and grand totals for the PivotTable in Excel:

image

image

In the PivotTable the only difference you’ll see is that the grand totals are now not displayed:

image

But the query generated now only returns the values that are actually needed, and as a result will be a lot more efficient and potentially a lot faster. In this case, the query now only returns the 12 rows of data displayed:

image

image

Even with subtotals and grand totals turned off there are still some cases where unwanted values will be returned. Take the following PivotTable, where I have used the Calendar hierarchy from the Date dimension on columns and filtered it so I only see the three months in Q1 CY 2012:

image

image

The MDX query generated by Excel requests requests four extra columns with subtotals and a grand total that aren’t displayed:

image

There is a fairly simple workaround though. By changing how the PivotTable is constructed – in this case by not putting a hierarchy on columns but just the Month Of Year attribute, then adding slicers for Calendar Year and Calendar Quarter to control which months are displayed – you can get an efficient query. This version of the PivotTable:

image

image

…results in this query being generated, which only returns the required values:

image

To sum up, then, if you have a PivotTable that seems a bit slow:

  • Turn off subtotals and grand totals unless you really need to see them
  • Look at the MDX query being generated by Excel and see if it is still returning unnecessary subtotals and grand totals, and if it is try rebuilding the PivotTable to get the same results in a different way (for example by not drilling down on hierarchies as in the last example) to get a more efficient query

[All the queries in this post were generated by Excel 365 version 1910 build 12130.20238]

Excel Dynamic Arrays And FilterXML

I’ll admit that I’m a bit less excited about Excel dynamic arrays than I was when I last blogged about them. Don’t get me wrong: from a pure Excel point-of-view they are still very cool, but I’ve since found out that the CubeValue function can’t be used with dynamic arrays which prevents me from doing all the really fun SSAS/Power BI/Power Pivot things I wanted to do with cube formulas.

It’s not all doom and gloom though. Several years ago I blogged about the then-new WebService and FilterXML functions (here and here). I very quickly found that the WebService function was very limited indeed and that Power Query did everything it did but better; on the other hand I felt FilterXML I had some unexplored potential, although I never got round to doing any exploring. Fast forward to last year and I saw that FilterXML was one of a number of existing functions that are affected by the new dynamic array behaviour, a change that makes it even more flexible.

Consider the following, publicly-available XML file:

http://www.hmrc.gov.uk/softwaredevelopers/rates/exrates-monthly-0719.xml

It’s a list of exchange rates published by the UK tax authorities and the contents look like this:

image

Just the kind of data you want to bring into Excel. The WebService function doesn’t work with this URL for some reason but it’s very easy to bring this data into an Excel table using Power Query with a few clicks using the Web data source:

image

If you prefer to work with dynamic arrays though (and I’m sure there are lots of reasons why that might be true), you can do that with a combination of Power Query and FilterXML.

The first thing to do is to use Power Query to load the entire XML document into a single cell in a worksheet. Here’s a query to do this:

let
    Source = 
    Text.FromBinary(
        Web.Contents(
            "http://www.hmrc.gov.uk/softwaredevelopers" &
            "/rates/exrates-monthly-0719.xml"
        )
    ),
    #"Converted to Table" = 
    #table(
        1, 
        {{Source}}
    ),
    #"Cleaned Text" = 
    Table.TransformColumns(
        #"Converted to Table",
        {{"Column1", Text.Clean, type text}}
    ),
    #"Replaced Value" = 
    Table.ReplaceValue(
        #"Cleaned Text",
        "> <",
        "><",
        Replacer.ReplaceText,{"Column1"}
    ),
    #"Replaced Value1" = 
    Table.ReplaceValue(
        #"Replaced Value",
        ">  <",
        "><",
        Replacer.ReplaceText,{"Column1"}
    ),
    #"Replaced Value2" = 
    Table.ReplaceValue(
        #"Replaced Value1",
        ">   <",
        "><",
        Replacer.ReplaceText,
        {"Column1"}
    ),
    #"Replaced Value3" = 
    Table.ReplaceValue(
        #"Replaced Value2",
        ">    <",
        "><",
        Replacer.ReplaceText,{"Column1"}
    )
in
    #"Replaced Value3"

One interesting point to make here: the FilterXML function does not like spaces between closing and opening angle brackets in XML (maybe this is why WebService errors too?) so I’m removing all occurrences of this, as well as removing any unprintable characters. Here’s the output, a table with one column and one row where the only cell contains the full XML:

image

You can then use an Excel formula like this to run an XPath query against this XML document:

=FILTERXML($A$2, "/exchangeRateMonthList/exchangeRate/countryName")

to get a list of all the country names spilling out to as many rows in the worksheet as necessary:

3d266a16-fa27-4927-9f79-558be7891293

[If you want a comparison with how FilterXML used to work in all its CTRL+SHIFT+ENTER glory see the “Scraping a whole XML document” section here; if you want to learn XPath, the query language used by FilterXML, there is a good tutorial here]

I’m not an XPath expert, or even an Excel expert, so I’ll finish here but hopefully this will prove useful to someone. You can download an Excel workbook containing the demos from this post here – note that neither Power Query nor dynamic arrays work in Excel Online yet, so don’t look at the workbook in the browser.

Book Review: “Collect, Combine And Transform Data Using Power Query In Excel And Power BI” by Gil Raviv

I generally try to avoid writing book reviews here, but the fact that there are so few books available on Power Query and M means that I’m making an exception for “Collect, Combine and Transform Data using Power Query in Excel and Power BI” by Gil Raviv.

The first thing to say about this book is that it takes the approach of teaching through exercises and worked examples, rather than by explaining abstract concepts. If this suits your style of learning (and I know that it does for a lot of people) then you’re in luck; if you’re looking for a book that will explain what all the different join types for Merge operations do, for example, then you’ll be disappointed. This isn’t a criticism though – I don’t think it’s possible to write a book that will satisfy everyone – and Gil has done a good job of covering a lot of common data preparation scenarios. One important exception to this is the chapter on M which provides a very clear introduction to the language and the way it works. I suspect a lot of people will want to buy the book for this chapter alone.

The second thing to say about this book is that while it covers both Power Query in Excel and Power BI, in my opinion it’s aimed slightly more at Excel users. Again, this is not a criticism: although advanced Excel users and Power BI report designers have to solve many of the same problems, they also have some very different concerns too. What’s more, if you can assume your readers have good Excel skills and can explain Power Query concepts in Excel terms then you’ll serve that particular set of readers well, and probably do a better job for them than if you assume they are completely new to the area of data transformation and preparation have no existing skills in this area.

All in all it’s a good book that I can recommend to anyone who wants to learn Power Query and M, and also for intermediate users who want to deepen their knowledge. I still think there’s a need for a book completely devoted to M and covering topics such as custom connectors and dataflows; hopefully someone writes one soon!

Full disclosure: I received a free copy of the book from the author. I’m also the author of a Power Query book myself, but to be honest it’s several years old now and a bit out-of-date, so it’s hard to recommend it any more.

You can buy a copy of this book from Amazon UK here.

%d bloggers like this: