Power Query Comes To Azure Data Factory With Wrangling Data Flows

Chris Webb Azure Data Factory, M, Power Query May 10, 2019 1 Minute

One of the many big announcements at Build this week, and one that caused a lot of discussion on Twitter, was about Wrangling Data Flows in Azure Data Factory. You can read the blog post here:

https://azure.microsoft.com/en-us/blog/analytics-in-azure-remains-unmatched-with-new-innovations/

…but what isn’t clear from this is that it’s basically Power Query Online integrated into ADF. You can see it in action by watching the following video – the demo of Wrangling Data Flows starts at around the 21 minute mark:

https://mybuild.techcommunity.microsoft.com/sessions/76997

As the presenter says, the Power Query Online editor generates M in the background as you would expect and “we are going to take this M and translate it into Spark and run it over big data”. Query folding to Spark, basically. More technical detail about all this is available here:

https://github.com/gauravmalhot/wranglingdataflow

…including a document discussing which M functions currently support query folding and which ones as yet don’t. Obviously, this feature will only work well if as much query folding as possible takes place.

This feels like a much more significant win for team Power Query than the integration with SSIS that was announced recently, if only because SSIS is a bit legacy and ADF is the cool new thing. I wonder if this opens up the possibility of integration between Power BI dataflows and ADF in the future, as another example of how self-service BI solutions can be easily transitioned into centrally-managed, enterprise-grade BI solutions? If that happens I hope someone sorts out the dataflow/data flow naming mess.

You can sign up for the preview of Wrangling Data Flows here.

Published by Chris Webb

My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel. View all posts by Chris Webb

Published May 10, 2019

9 thoughts on “Power Query Comes To Azure Data Factory With Wrangling Data Flows”

suhail ali says:

May 10, 2019 at 8:36 pm

Chris, there seems to a big overlap in functionality between mapping data flows and wrangling data flows. You can aggregate, filter and sort in both. There are also unique functionality to each such as window functions in mapping. I can’t seem to figure out why they have both and not just beef up wrangling since it powerquery and across platforms. What do you think?

Loading...

Reply
Simon Nuss says:

May 13, 2019 at 6:11 pm

Notable unsupported functionality:
Merge columns
Split column
Append queries
Changing Column Types
“Use first row as headers” or “Use headers as first row”

It’s surprising to see Append and Changing Column Types in this list. The rest are pretty standard query folding limitations, however.

Loading...

Reply
Simon Nuss says:

May 13, 2019 at 6:21 pm

Random thoughts –
Since ADFv2 is essentially one giant JSON file, does this open up an avenue for BIML-like scripting tools to auto-generate M document pipelines?
Unless, of course, an ADFv2 ForEach loop may soon be capable of feeding variables into an M document?

Loading...

Reply
1. suhail ali says:
  
  May 15, 2019 at 10:16 pm
  
  Simon, ADF has it’s own templating solution called the ARM template. This is what they use to create the templates in the template gallery such as SCD Type 2 template.
  
  Your point about auto-generating M code sounds intriguing.
  
  Loading...
  
  Reply
Pingback: ADF zonder SSIS(-IR)? - Monkey Consultancy
Darran says:

November 4, 2019 at 3:06 pm

Hello Chris, nice article thank you. We have been testing ADF V2 and looks like it would work for our ETL process. As Data Wrangling is in limited preview, I’m thinking I should use ADF data flows to replicate our current powerquery ETL – however I’m concerned at the size of the data flow will become rather long and difficult to manage as ADF GUI represents this horizontally. Interested to hear your views – should use data flows or wait for GA of wrangling dataflows?

Many thanks!

Loading...

Reply
paSQuaLe ceglie says:

June 3, 2021 at 8:31 am

Hi folks, I’m struggling with the below error:
Expression.Error: The Power Query Spark Runtime does not support the function Table.AddIndexColumn.

Loading...

Reply
1. Chris Webb says:
  
  June 3, 2021 at 11:02 am
  
  A lot of Power Query functionality is not yet supported in Power Query inside ADF, and I’m almost certain that adding an index column is one of these things
  
  Loading...
  
  Reply
  1. NK says:
    
    February 10, 2022 at 10:13 pm
    
    Any suggested alternatives for this using custom M Code? Thx
    
    Loading...