Flattening A Parent/Child Relationship In Data Explorer (Power Query)

NOTE: This post was written before Data Explorer was renamed as Power Query. All of the content is still relevant to Power Query.

I was teaching my SSAS cube design and performance tuning course this week (which I’ll be teaching in Sydney and Melbourne next month, along with some MDX – places still available!) and demonstrating BIDS Helper’s excellent functionality for flattening parent/child relationships, and it got me thinking – can I do the same thing in Data Explorer? Not that I need to do this in Data Explorer, you know, but it’s the kind of challenge I like to set myself. Of course you can do it, and quite elegantly, and since I learned yet more interesting stuff about Data Explorer and M while I was cracking this problem I thought I’d blog about it.

Here’s what I want to do. Consider the parent/child hierarchy in the DimEmployees table in the Adventure Works DW database:

image

Each row represents an employee, EmployeeKey is the primary key and ParentEmployeeKey is the key of the employee’s boss. Therefore, by joining the table to itself, we can recreate the org chart of the Adventure Works company (ie who reports to who). The problem though is that we need to join the table to itself multiple times to do this, and the number of times we need to do the join depends on the data itself. If you flatten a parent/child hierarchy by doing this, the end result should have a series of columns representing each level in the hierarchy, and look something like this:

image

This problem can be solved in SQL reasonably easily, even if the SQL you end up writing might look a little scary (see the views that BIDS Helper generates for an example of this). What about Data Explorer?

At the heart of my approach was a recursive function. I’ve blogged about creating functions in Data Explorer already, so you might want to read that post for some background. Here’s my function declaration:

let

    Source = (FromTable, KeyColumn, ParentKeyColumn, ToTable, optional Depth) =>

let

    GetDepth = if (Depth=null) then 1 else Depth,

    GetKeyColumn = if (Depth=null) then KeyColumn

        else Number.ToText(GetDepth-1) & "." & KeyColumn,

    GetParentKeyColumn = Number.ToText(GetDepth) & "." & ParentKeyColumn,

    JoinTables = Table.Join(FromTable,{GetKeyColumn},

        Table.PrefixColumns(ToTable , Number.ToText(GetDepth)),

            {GetParentKeyColumn}, JoinKind.LeftOuter),

    FinalResult = if

        List.MatchesAll(Table.Column(JoinTables, GetParentKeyColumn), each _=null)

        then FromTable

        else RecursiveJoin(JoinTables, KeyColumn, ParentKeyColumn, ToTable, GetDepth+1)

in

    FinalResult

in

    Source

A few interesting things to point out:

  • I’ve used a LET statement inside my function declaration, so I can have multiple statements inside it
  • I’ve used Table.Join to do the left outer join between the two tables I’m expecting
  • The parameters I’m using are:
    • FromTable – the table on the left hand side of the join. When the function is first called, this should be a table that contains the Employees who have no parents (ie where ParentEmployeeKey is null); when the function calls itself, this will be the result of the join.
    • ToTable – the table on the right hand side of the join. This is always a table that contains the Employees who do have parents.
    • KeyColumn – the name of the Employee’s key column
    • ParentKeyColumn – the name of the Employee’s parent key column
  • I’ve used Table.PrefixColumn to rename all the columns in the table on the right hand side of the join, prefixing them with the depth of the call stack, so I get distinct column names.
  • The function calls itself until it finds it has done a join where the last ParentKeyColumn contains only null values. I’ve used List.MatchesAll to check this.

Here’s the call to this function – you only need to include one step in the Data Explorer query to do this – to return the flattened structure:

= RecursiveJoin(

    Table.SelectRows(Employees, each [ParentEmployeeKey]=null),

    "EmployeeKey",

    "ParentEmployeeKey",

    Table.SelectRows(Employees, each [ParentEmployeeKey]<>null)

    )

And here’s the output:

image

In this case the output isn’t exactly the same as what BIDS Helper might produce, because BIDS Helper has some special requirements for SSAS user hierarchies. Also, since I’m still learning Data Explorer and M, I’m not sure my code in the most efficient, elegant way. But I still think it’s an interesting example and I hope it’s useful to other Data Explorer enthusiasts out there – we’re a small but growing band!

You can download my demo workbook here.

8 thoughts on “Flattening A Parent/Child Relationship In Data Explorer (Power Query)

  1. Chris, great post.

    I am looking to write a function that takes a table as a parameter, then for each record, run a process (which I have already completed) that takes the records as parameters for a web query, and then combine the results of all these in a single table ( the result will be much longer than the parameter table, but not a determinate length)

    Any thoughts on how to take each of the records from one table as parameters for a function that returns a table and combine all the tables into a single result?

    It seems like I need to get recursive on this problem, and that I might need to write a function that calls my completed function as a parameter, but I am not good enough at this level of programming to come up with the result myself.

    1. Hi Derik,

      I think it should be quite easy. You can see some examples of calling a web service from Power Query here: http://blogs.msdn.com/b/powerbi/archive/2013/07/07/getting-started-with-pq-and-pm.aspx; I think what you would want to do is create your function so that it works for just one value in your table, then add a custom column to your original query and call the function for each row. It’s almost the same as what I’m doing in the YTD example here: http://cwebbbi.wordpress.com/2013/10/18/implementing-common-calculations-in-power-query/. In your case, though, in each row in the custom column you’d get a table object. It would then be easy to combine all the values in all of the table objects on each row, just by clicking on the header of that row.

  2. Superb function!
    For ragged hierarchies (SSAS or Power BI) I need a table, where resulting level with null values should be replaced with the level value from its parent.

    Exists there also a solution to handle this more generically (Level .. n.Level) instead of adding an additional column for each a level attribute (- and remove the genuine ones afterwards) ?

    = Table.AddColumn(FlattenedParentChildFromChris, “NewLevelColumn_xx”,
    each List.Last(List.RemoveNulls({[Level], [1.Level], [2.Level], [3.Level], .. , [n.Level]})) )

  3. Hello,

    I am trying to make use of your function but the it doesn’t understand what “GetDepth” is? Maybe I’m missing something as I’m very much a novice with using ‘M’.

    Any assistance appreciated as I do believe this function will greatly assist with what I am trying to do.

    Thanks.

    1. No worries.. I found an issue elsewhere in the code but I am now having issues with “RecursiveJoin” reference towards the end of the code, perhaps because of my mis-understanding of the ‘M’ language I suspect. Any help appreciated. The code I have in the advanced editor is as follows:

      let
      Source = (FromTable, KeyColumn, ParentKeyColumn, ToTable, optional Depth) =>
      let
      GetDepth = if (Depth=null) then 1 else Depth,
      GetKeyColumn = if (Depth=null) then KeyColumn
      else Number.ToText(GetDepth-1) & “.” & KeyColumn,
      GetParentKeyColumn = Number.ToText(GetDepth) & “.” & ParentKeyColumn,
      JoinTables = Table.Join(FromTable,{GetKeyColumn},
      Table.PrefixColumns(ToTable , Number.ToText(GetDepth)),
      {GetParentKeyColumn}, JoinKind.LeftOuter),
      FinalResult = if
      List.MatchesAll(Table.Column(JoinTables, GetParentKeyColumn), each _=null)
      then FromTable
      else RecursiveJoin(JoinTables, KeyColumn, ParentKeyColumn, ToTable, GetDepth+1)
      in
      FinalResult
      in
      Source

Leave a Reply to MarkCancel reply