Power Query Comes To Azure Data Factory With Wrangling Data Flows

One of the many big announcements at Build this week, and one that caused a lot of discussion on Twitter, was about Wrangling Data Flows in Azure Data Factory. You can read the blog post here:

https://azure.microsoft.com/en-us/blog/analytics-in-azure-remains-unmatched-with-new-innovations/

…but what isn’t clear from this is that it’s basically Power Query Online integrated into ADF. You can see it in action by watching the following video – the demo of Wrangling Data Flows starts at around the 21 minute mark:

https://mybuild.techcommunity.microsoft.com/sessions/76997

image

As the presenter says, the Power Query Online editor generates M in the background as you would expect and “we are going to take this M and translate it into Spark and run it over big data”. Query folding to Spark, basically. More technical detail about all this is available here:

https://github.com/gauravmalhot/wranglingdataflow

…including a document discussing which M functions currently support query folding and which ones as yet don’t. Obviously, this feature will only work well if as much query folding as possible takes place.

This feels like a much more significant win for team Power Query than the integration with SSIS that was announced recently, if only because SSIS is a bit legacy and ADF is the cool new thing. I wonder if this opens up the possibility of integration between Power BI dataflows and ADF in the future, as another example of how self-service BI solutions can be easily transitioned into centrally-managed, enterprise-grade BI solutions? If that happens I hope someone sorts out the dataflow/data flow naming mess.

You can sign up for the preview of Wrangling Data Flows here.

4 responses

  1. Chris, there seems to a big overlap in functionality between mapping data flows and wrangling data flows. You can aggregate, filter and sort in both. There are also unique functionality to each such as window functions in mapping. I can’t seem to figure out why they have both and not just beef up wrangling since it powerquery and across platforms. What do you think?

  2. Notable unsupported functionality:
    Merge columns
    Split column
    Append queries
    Changing Column Types
    “Use first row as headers” or “Use headers as first row”

    It’s surprising to see Append and Changing Column Types in this list. The rest are pretty standard query folding limitations, however.

  3. Random thoughts –
    Since ADFv2 is essentially one giant JSON file, does this open up an avenue for BIML-like scripting tools to auto-generate M document pipelines?
    Unless, of course, an ADFv2 ForEach loop may soon be capable of feeding variables into an M document?

    • Simon, ADF has it’s own templating solution called the ARM template. This is what they use to create the templates in the template gallery such as SCD Type 2 template.

      Your point about auto-generating M code sounds intriguing.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: