Some Power Query/M Examples

The more I use Power Query and the M language that underpins it, the more I see how useful it is – quite apart from its abilities to import data from other data sources – as a third option (after regular Excel formulas and Power Pivot/DAX) to solve problems in Excel itself. For example, last week I read this blog post by David Hager about finding the number of unique values in a delimited string:

http://dailydoseofexcel.com/archives/2013/08/07/calculating-the-number-of-unique-items-in-a-delimited-string/

It’s an interesting question and the Excel formulas in this post are way beyond what I’m capable of writing. The point is, though, that Power Query can do this very easily indeed just through the UI. Starting with the following table in a worksheet:

image

You just need to import it into a Power Query query:

image

Use the Split Column/By Delimiter/By Comma option on the Input column:

image

This creates as many columns as you’ve got values in the delimited list with the largest number of values:

image

You can then use the Unpivot option on these new columns:

image

Then remove the Attribute column:

image

Next, select Remove Duplicates on the entire table:

image

Finally, do a Group By on the RowID column and Count the number of rows:

image

And bingo, you have the number of distinct values in each delimited list:

image

Here’s the complete code:

let

    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],

    SplitColumnDelimiter = Table.SplitColumn(Source,"Input",Splitter.SplitTextByDelimiter(","),13),

    Unpivot = Table.Unpivot(SplitColumnDelimiter,{"Input.1", "Input.2", "Input.3", "Input.4",

    "Input.5", "Input.6",    "Input.7", "Input.8", "Input.9", "Input.10", "Input.11", "Input.12"

    ,  "Input.13"},"Attribute","Value"),

    RemovedColumns = Table.RemoveColumns(Unpivot,{"Attribute"}),

    DuplicatesRemoved = Table.Distinct(RemovedColumns),

    GroupedRows = Table.Group(DuplicatesRemoved, {"RowID"}, {{"Count of Distinct Values"

    , each Table.RowCount(_), type number}})

in

    GroupedRows

 

Emboldened by this, I turned to another Excel challenge – this time from Chandoo’s blog:

http://chandoo.org/wp/2013/07/16/formula-challenge-001-1/

This time the objective is to split a string containing blocks of text and numbers so that you get everything after the first block of numbers. The whole point of the challenge that Chandoo lays down is to do this in a single Excel formula with no VBA; I was just curious to see how easy it would be to solve the problem in Power Query however many steps it took. Unfortunately this is not a something that you can do just in the UI and you need to write some M, but the code isn’t too bad:

Starting with the following table:

image

You then need to create a new column containing a list of the positions of every occurrence of a numeric character in each string:

image

Then create another new column containing a list of continuous numbers starting from the first number in the previous list and incrementing by 1:

image

Then create another new column containing the minimum value from a list all of the values that are in the previous list and not in the first list you created :

image

This gives the position of the first character in the second block of text, which can then be used to find a substring of the original text:

image

Here’s the complete code:

let

    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],

    InsertedCustom = Table.AddColumn(Source, "NumericPositions", 

    each Text.PositionOfAny([Input], {"0", "1", "2", "3", "4", "5", "6", "7", "8", "9"}, Occurrence.All)),

    InsertedCustom1 = Table.AddColumn(InsertedCustom, "NumberOfPositions", 

    each List.Numbers(List.Min([NumericPositions]), List.Count([NumericPositions]))),

    InsertedCustom2 = Table.AddColumn(InsertedCustom1, "StartChar", 

    each List.Min(List.Difference([NumberOfPositions], [NumericPositions]))),

    InsertedCustom3 = Table.AddColumn(InsertedCustom2, "Output", each Text.Range([Input], [StartChar]))

in

    InsertedCustom3

 

My first reaction when I saw M was that I liked it, but I didn’t think Excel users were prepared to learn yet another language. I still think this (it would have been much better if Power Query, like Power Pivot/DAX, used a language that was consistent with Excel formula language) but as you can see from these examples, sometimes having a choice of tools means it’s more likely that you can find an easy solution to a problem.

27 thoughts on “Some Power Query/M Examples

  1. It’s really difficult to understand, the reason why any company choose power query to try doing some ETL stufff inside an EXCEL File.
    Applying lines of code for ETL, DAX formulas, VBA…. are not SELF SERVICE BI! SELF SERVICE is having everything in place to consume! When I go to McDonalds, I’ll not go to prepare my hamburger… I want it prepared to just eat it! I just need to choose what I want. This is Self Service for me.

    I understand that microsoft want to improve Office sales and customer fidelity, but all decision makers should carefully think on the potential of business data and rules anarchy.

    I love Excel just to explore data. Data should be good, trusted and centralized for everyone see the same version of the truth.

    Just my opinion! 😉
    Thank you Chris for all you support and blogging tips!

    1. Building mini ETL pipelines in excel is something some power data analysts already do. This just provides a tool that makes it easier.

      For the more typical excel user, you’re right, they’ll just consume queries _someone else_ has built, certified, and published to a Power BI site where they can govern access and monitor usage.

  2. The history of M is interesting. My reaction is I wished it was Clojure, my current functional language of choice! I love having a ‘let’ statement for variables. It would be nice if Excel formula language or DAX had a ‘let’ to allow local variables.

    1. I fully agree Donald Qliview scripting has this let ability and it works perfect for simple budgeting and what-if scenarios. Well why? Then they would not be able to sell Excel probably!

  3. Chris, this was a good example you used to understand the power of Power Query. I just recently implemented many to many dimension ETL loading using SSIS. Your query to unpivot a comma separated list of distinct values is similar to what needs to be done when implementing many to many dimensions. This was a pain in SSIS and I also attempted a simpler solution using SQL and XML which was simple but suffered from bad performance. Power Query seems like a simple and perhaps performing way to do it. I do agree with everyone that if Power Query was well integrated into SSIS I would use it over data flow tasks in a heartbeat. Being a hopeless optimist, I’m hoping that will eventually happen and we should take this time to learn the language now. What do you think Chris?

      1. Thanks Chrisd for this examples, they’re really good. When exactly will your Power Query book be published?

  4. Chris, given your extensive command of M functions, I believe that you overlooked a simpler solution to the count duplicate problem e.g.
    let
    Source = Excel.CurrentWorkbook(){[Name=”Table1″]}[Content],
    InsertedCustom = Table.AddColumn(Source, “Custom”, each List.Count(List.Distinct(Text.Split([Input],”,”))))
    in
    InsertedCustom

    And if you wanted to return the distinct values in each row, you can add the line
    InsertedCustom1 = Table.AddColumn(InsertedCustom,”Custom1″, each Text.Combine(List.Distinct(Text.Split([Input],”,”)),”,”))
    (replacing “in InsertedCustom” with “in InsertedCustom1”)

    Clever solution to the second problem! Note that the list of numbers can be written more compactly as {“0”..”9″}

    1. Thanks Colin – the point I was trying to make with the count distinct problem was that it could be solved in Power Query without writing any M code manually at all, and just using the UI;I don’t think many Power Query users will want to learn M. But nice solution!

      BTW I’m on holiday at the moment and will reply to your recent email in a few days when I get home…

      1. Hi Chris,

        Similar to your second solution, this is how I did mine, just sharing so other learners like myself can go through it:

        let

        Source = Excel.CurrentWorkbook(){[Name=”Table3″]}[Content],

        InsertedCustom = Table.AddColumn(Source, “FirstNumberPosition”, each List.PositionOfAny(Text.ToList([Input]),{“0”..”9″},Occurrence.First)),

        InsertedCustom1 = Table.AddColumn(InsertedCustom, “InitialNumberText”, each Text.Range([Input],[FirstNumberPosition])),

        InsertedCustom2 = Table.AddColumn(InsertedCustom1, “PositionFirstAlphabet”, each List.PositionOfAny(Text.ToList([InitialNumberText]),{“a”..”z”,”A”..”Z”})),

        InsertedCustom3 = Table.AddColumn(InsertedCustom2, “Output”, each Text.Range([InitialNumberText], [PositionFirstAlphabet]))

        in

        InsertedCustom3

        This is how I might ‘incrementally’ implement it in Excel, this should be familiar to Excel Power users.

        NB> Please note that we eargerly await your power query book, I hope commenters on this blog would get a discount 🙂

  5. Wow – lots of angry comments here. As a vanilla excel / VBA expert (who is actually an accountant by trade), I would like to say that M and power query blow VBA and Vanilla Excel formulas out of the water. There is so much utility in having a functional programming language available (VBA is not a FUNCTIONAL programming language – no anonymous function capabilities)

    I used code very similar to this example to solve a very time consuming formerly manual task at work. We match debits and credits after summarizing transaction by some criteria – simple scenario. I have simplified things even more here to demonstrate. The following code gets right to the jist of my problem in a few lines of code that vanilla excel cannot handle (because i cannot feed a list of excel results back through excel and developing this in VBA would have taken 10 times as long)

    ImportedExcel = Excel.Workbook(File),
    Table1_Table = ImportedExcel{[Item=”Table1″,Kind=”Table”]}[Data],
    AmountsColumn = Table1_Table[Amount],

    DistinctAmounts = List.Distinct(AmountsColumn),

    DoubledUpAmounts = List.Difference(AmountsColumn,DistinctAmounts),
    DuplicatesRemoved = List.Distinct(DoubledUpAmounts)

    There in 6 short easy to read steps I have shredded a formerly very difficult task that would have been a nightmare to code with VBA. Eat your hearts out PowerQuery haters.

    Also see further examples at https://www.youtube.com/channel/UCxoZHrue1ZOFdDr7CGX43kw/videos

  6. Just chiming in from a DBA perspective:
    1) Who came up with the name M? There’s a database-style language called MUMPS, often referred to by M. This makes finding details on this freaking horrific. And that doesn’t even include that “I’m” comes up when I search google for “M”.
    2) Saw some demos on it this weekend (Brian Smith with Microsoft) – really awesome for end users… but I want a way to use that on the server side. Let the Analysts clean the data file, then give the script to me so that I can automatically do it going forward. But he was _really_ non-committal about that feature. Alas.

    1. Some answers…
      1) M is only the unofficial name. It was the code name, when M was a standalone language (well before Power Query was thought of – it was developed for a completely different project), and the official name now is “Power Query Formula Language” but there are still lots of references to M around in the docs.
      2) A lot of people have been asking for this. Right now, you can refresh Excel workbooks that contain Power Query queries when you upload them to a Power BI site in Office 365; I would love to see some kind of standalone server for Power Query.

    2. M was originally publicly disclosed as a codename for the language part of the now defunct “Oslo” modeling platform. In that context “M” stood for “Model”. The potential conflict with MUMPS was evaluated at that time but because “M” was considered to be just a codename it was kept assuming that some other name would be chosen upon product release. I left MSFT in 2010 so can’t comment on the decision to keep it with the release of Power Query. It does appear that the documentation also refers to it as the “Power Query for Excel
      Formula Language” and the shorter “Power Query Formula Language”. “M” (and it’s supporting infrastructure) was developed to be a very broadly applicable language and the original designers very much envisioned a world with “M” being adopted into various layers of MSFT’s servers and tools. How much of that vision still exists I don’t know but it is notoriously difficult to drive new plumbing into other product groups that you don’t control. So while the Office group has found utility for “M” in Power Query a completely different organization controls SSIS. I imagine that why you didn’t get any commitment from Brian as he is in the Office group.

      And to make things more confusing, notice that MSFT research has a project called M# that apparently has nothing to do with M. There is also a MSFT partner with a product for more rapidly building ASP.NET applications through DSLs. This really smells like it could be upon M as this was one of the primary initial use cases scoped out for M early on. AND, MSFT has a new effort also called Oslo for surfacing data in the Office Graph…

      M# – DSLs over ASP.NET
      http://www.msharp.co.uk/Home.html

      M# – MSFT Research extensions to C# for Midori
      http://www.zdnet.com/microsofts-midori-the-m-connection-7000024664/

      Oslo – New tool for surfacing data in the Office Graph
      http://www.pcworld.com/article/2153784/microsoft-oslo-will-help-you-stay-on-top-of-what-s-relevant.html

  7. Hi Chris, do you know if there is a way to define a custom aggregate function to use with Table.Group? I am looking for something that would accumulate the values in a comma separated way (for example instead of averaging 1, 2 and 3 it would accumulate them and output “1,2,3”. thanks.

      1. Thanks. I think I manager to do it, but there might be a quicker option.

        My source table would look like this:

        System Domain
        ————————-
        App2 HTR
        App1 SC
        App2 RTR

        and in return I want something like this:

        System Domains
        ————————-
        App1 SC
        App2 HTR, RTR

        My steps are:

        let
        Source = Excel.CurrentWorkbook(){[Name=”Table1″]}[Content],
        CombineWithComma = Combiner.CombineTextByDelimiter(“, “),
        GroupedRows = Table.Group(Source, {“System”}, {{“rows”, each _, type table}}),
        Results = Table.AddColumn(GroupedRows, “Domains”, each CombineWithComma(Table.ToList(Table.SelectColumns([rows], “Domain”)))),
        #”Removed Columns” = Table.RemoveColumns(Results,{“rows”}),
        #”Sorted Rows” = Table.Sort(#”Removed Columns”,{{“System”, Order.Ascending}})
        in
        #”Sorted Rows”

        Do you think this is the right way to do it?
        Thanks for your help, Ben

  8. HI Chris,

    Would you be able to guide us on how to write nested queries with M? I have a list of customer id’s stored in data source, and a huge data base of transaction lines in another. I would like to write a powerquery query that only pulls in records of the list of customers maintained in a table. Something similar to nesting a SQL query in another SQL query after the ‘where’ keyword.

    Any help would me much appreciated. thanks in advance!

  9. I realise this thread is old but I just read it today. My two penn’orth is that I recreated your Table1 in a virgin Excel file then got Get&Transform to use that table as the source of a new query. I then copied and pasted your distinct count/value M code into the Query Advanced View, pressed OK and bingo. The answer!
    Isn’t that part of what a language and sharing the language is all about!!!

    Duncan

Leave a Reply to David Hager (@dhExcel)Cancel reply