Azure Data Factory · M · Power Query

Power Query Comes To Azure Data Factory With Wrangling Data Flows

One of the many big announcements at Build this week, and one that caused a lot of discussion on Twitter, was about Wrangling Data Flows in Azure Data Factory. You can read the blog post here:

…but what isn’t clear from this is that it’s basically Power Query Online integrated into ADF. You can see it in action by watching the following video – the demo of Wrangling Data Flows starts at around the 21 minute mark:


As the presenter says, the Power Query Online editor generates M in the background as you would expect and “we are going to take this M and translate it into Spark and run it over big data”. Query folding to Spark, basically. More technical detail about all this is available here:

…including a document discussing which M functions currently support query folding and which ones as yet don’t. Obviously, this feature will only work well if as much query folding as possible takes place.

This feels like a much more significant win for team Power Query than the integration with SSIS that was announced recently, if only because SSIS is a bit legacy and ADF is the cool new thing. I wonder if this opens up the possibility of integration between Power BI dataflows and ADF in the future, as another example of how self-service BI solutions can be easily transitioned into centrally-managed, enterprise-grade BI solutions? If that happens I hope someone sorts out the dataflow/data flow naming mess.

You can sign up for the preview of Wrangling Data Flows here.

9 thoughts on “Power Query Comes To Azure Data Factory With Wrangling Data Flows

  1. Chris, there seems to a big overlap in functionality between mapping data flows and wrangling data flows. You can aggregate, filter and sort in both. There are also unique functionality to each such as window functions in mapping. I can’t seem to figure out why they have both and not just beef up wrangling since it powerquery and across platforms. What do you think?

  2. Notable unsupported functionality:
    Merge columns
    Split column
    Append queries
    Changing Column Types
    “Use first row as headers” or “Use headers as first row”

    It’s surprising to see Append and Changing Column Types in this list. The rest are pretty standard query folding limitations, however.

  3. Random thoughts –
    Since ADFv2 is essentially one giant JSON file, does this open up an avenue for BIML-like scripting tools to auto-generate M document pipelines?
    Unless, of course, an ADFv2 ForEach loop may soon be capable of feeding variables into an M document?

    1. Simon, ADF has it’s own templating solution called the ARM template. This is what they use to create the templates in the template gallery such as SCD Type 2 template.

      Your point about auto-generating M code sounds intriguing.

  4. Hello Chris, nice article thank you. We have been testing ADF V2 and looks like it would work for our ETL process. As Data Wrangling is in limited preview, I’m thinking I should use ADF data flows to replicate our current powerquery ETL – however I’m concerned at the size of the data flow will become rather long and difficult to manage as ADF GUI represents this horizontally. Interested to hear your views – should use data flows or wait for GA of wrangling dataflows?

    Many thanks!

  5. Hi folks, I’m struggling with the below error:
    Expression.Error: The Power Query Spark Runtime does not support the function Table.AddIndexColumn.

    1. Chris Webb – My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel.
      Chris Webb says:

      A lot of Power Query functionality is not yet supported in Power Query inside ADF, and I’m almost certain that adding an index column is one of these things

Leave a ReplyCancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.