Using Function.ScalarVector() To Optimise Calls To M Functions

One of the most common issues faced when calling web services in M is that the the easiest way of doing so – creating a function that calls the web service, then calling the function once per row in a table using the Invoke Custom Function button – is very inefficient. It’s a much better idea to batch up calls to a web service, if the web service supports this, but doing this forces you to write more complex M code. It’s a problem I wrestled with last year in my custom connector for the Cognitive Services API, and in that case I opted to create functions that can be passed lists instead (see here for more information on how functions with parameters of type list work); I’m sure the developers working on the new AI features in dataflows had to deal with the same problem. This problem is not limited to web services either: calculations such as running totals also need to be optimised in the same way if they are to perform well in M. The good news is that there is a new M function Function.ScalarVector() that allows you to create functions that combine the ease-of-use of row-by-row requests with the efficiency of batch requests.

I’m going to try to keep things as simple as possible while showing how it works. Consider the following M function, called SingleValueUpper:

(InputText as text) => 
    Text.Upper(InputText)

It takes a single text value and returns the same text converted to upper case. Now consider the following query called Product that returns a table of fruit names in lower case:

#table(
    type table [Fruit=text],
    {{"apples"},{"oranges"},{"pears"}}
    )

image

The function can be called on this table in a third query with the following expression (which is basically what the Invoke Custom Function button produces):

Table.AddColumn(
    Product, 
    "Fruit Upper", 
    each SingleValueUpper([Fruit])
    )

The output of this is:

image

In this example the function is called three times – once for each row – and as I said if you were calling a web service in this way, for example, it would be very inefficient.

Now consider the following M function, called ListUpper:

(InputText as list) => 
    List.Transform(
        InputText, 
        each Text.Upper(_)
        )

This function takes a list of text values, converts each of the text values to upper case, and returns a list of the results. Calling the function with the contents of the Fruit column of the table returned by the Product query, like so:

ListUpper(Product[Fruit])

…returns the following list:

image

If you were calling a web service like this – perhaps by making a single POST request with all of these values in the body – it would be much more efficient.

Function.ScalarVector allows you to call a function like this through the Invoke Function Button. The following function, called ScalarVectorUpper in my example, shows how:

Function.ScalarVector(
    type function(Fruit as text) as text,
    (InputTable) =>
        let
            BufferTable = Table.Buffer(InputTable),
            InputList = BufferTable[Fruit],
            CallFunction = ListUpper(InputList)            
        in
            CallFunction
)

image

Function.ScalarVector is given two arguments:

  • A function type whose parameters match the columns from the input table that we’re interested in using. In this case the input table is the Product table and it has one text column called Fruit; we want to pass the contents of the Fruit column to this function.
  • A function that is passed a table (in this case called InputTable) that has the same columns as those listed in the function type. The documentation says this function “must not enumerate its input table more than once” so it uses Table.Buffer() to make sure this is the case; it then takes the contents of the Fruit column from this table and passes it to the ListUpper function defined above, and returns a list of upper case text values.

Basically, it creates a function that looks like it is being called on a row-by-row basis but which is actually called once for the whole table.

The result of calling this function on the table returned by the Product query:

Table.AddColumn(
    Product, 
    "Fruit Upper", 
    each ScalarVectorUpper([Fruit])
    )

…is the same as before, but potentially a lot more efficient:

image

This may seem obscure, but personally I think this is incredibly useful and important. Anyone writing a custom connector in the future (although Function.ScalarVector() isn’t available in the current release of the Power Query SDK I assume it will appear soon) is likely to want to use it, and remember also that custom connectors can be used to create M function libraries and I believe this scenario will be officially supported too.

You can download the example pbix file for this post here.

16 responses

  1. Hi Chris, very interesting topic. The how-to is a bit too convoluted for what I believe should be a more straightforward feature…
    Have you tested whether this would have performance benefits locally vs regular function invocation?

  2. I think the point is that it does not have any inherent performance benefits, but it does allow you to implement your functions in a more efficient way than would otherwise be possible.

  3. Hi Chris

    I have a Query Called FOLDER Defined as below

    let
    Source = Folder.Files(“C:\DATA”),
    mRemOthCols = Table.SelectColumns(Source,{“Folder Path”, “Name”}),
    mAddCustCol = Table.Combine(Table.AddColumn(mRemOthCols, “Custom”, each fGetDataSht([Folder Path],[Name]))[Custom])
    in
    mAddCustCol

    i have a Function called fGetDataSht defined as below

    let
    fGetDataSht=(filePath as text, fName as text)=>
    let
    Source = Excel.Workbook(File.Contents(filePath & fName), true, true){0}[Data]
    in
    Source
    in
    fGetDataSht

    Currently the function is getting invoked as many times as there are files in the Folder.
    How can we use Function.ScalarVector in this case

    • You could rewrite your function using Function.ScalarVector but what do you want to achieve here? Unlike a web service there’s no way you can avoid opening Excel files individually. There might be some improvements possible if you’re getting data from multiple sheets in the same workbook though.

  4. @Chris – I tried doing this, but i am doing something wrong. Can you kindly look at the below

    Function.ScalarVector(
    type function(filePath as any,fName as any) as table,
    (InputTable) =>
    let
    BufferTable = Table.Buffer(InputTable),
    CallFunction = fGetDataSht(each BufferTable[Folder Path]{_},each BufferTable[Name]{_})
    in
    CallFunction
    )

  5. Pingback: Optimizing M Function Calls With Function.ScaleVector() – Curated SQL

  6. Pingback: Using Function.ScalarVector() To Optimise Calls To M Functions | Pardaan.com

  7. I’ve created a really primitive example at https://gist.github.com/CurtHagenlocher/498d68282f0d355fc55ef7ecb563805b which uses the Cognitive Services languages API to detect language. Even though it uses Table.AddColumn to detect language in a single row, it leverages the vectorization support to make only one API call per 1000 input rows.

    Note that I’m only buffering each page of 1000 values instead of buffering the entire column. This avoids having to enumerate the input table twice. Most of the magic in the implementation is there to let us stitch together multiple enumerations to avoid having to buffer too much. To take advantage of this streaming capability, you need to use something like List.Split to separate a list into a streaming-capable list of lists and then List.Combine to produce a single list from the output.

  8. @Chris – Could you kindly point out how consolidation of data in multiple sheets of a single file can be optimised using Function ScalarVector

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: