Using Function.ScalarVector() To Optimise Calls To M Functions

One of the most common issues faced when calling web services in M is that the the easiest way of doing so – creating a function that calls the web service, then calling the function once per row in a table using the Invoke Custom Function button – is very inefficient. It’s a much better idea to batch up calls to a web service, if the web service supports this, but doing this forces you to write more complex M code. It’s a problem I wrestled with last year in my custom connector for the Cognitive Services API, and in that case I opted to create functions that can be passed lists instead (see here for more information on how functions with parameters of type list work); I’m sure the developers working on the new AI features in dataflows had to deal with the same problem. This problem is not limited to web services either: calculations such as running totals also need to be optimised in the same way if they are to perform well in M. The good news is that there is a new M function Function.ScalarVector() that allows you to create functions that combine the ease-of-use of row-by-row requests with the efficiency of batch requests.

I’m going to try to keep things as simple as possible while showing how it works. Consider the following M function, called SingleValueUpper:

(InputText as text) =>
Text.Upper(InputText)

It takes a single text value and returns the same text converted to upper case. Now consider the following query called Product that returns a table of fruit names in lower case:

#table(
type table [Fruit=text],
{{“apples”},{“oranges”},{“pears”}}
)

image

The function can be called on this table in a third query with the following expression (which is basically what the Invoke Custom Function button produces):

Table.AddColumn(
Product,
“Fruit Upper”,
each SingleValueUpper([Fruit])
)

The output of this is:

image

In this example the function is called three times – once for each row – and as I said if you were calling a web service in this way, for example, it would be very inefficient.

Now consider the following M function, called ListUpper:

(InputText as list) =>
List.Transform(
InputText,
each Text.Upper(_)
)

This function takes a list of text values, converts each of the text values to upper case, and returns a list of the results. Calling the function with the contents of the Fruit column of the table returned by the Product query, like so:

ListUpper(Product[Fruit])

…returns the following list:

image

If you were calling a web service like this – perhaps by making a single POST request with all of these values in the body – it would be much more efficient.

Function.ScalarVector allows you to call a function like this through the Invoke Function Button. The following function, called ScalarVectorUpper in my example, shows how:

Function.ScalarVector(
type function(Fruit as text) as text,
(InputTable) =>
let
BufferTable = Table.Buffer(InputTable),
InputList = BufferTable[Fruit],
CallFunction = ListUpper(InputList)
in
CallFunction
)

image

Function.ScalarVector is given two arguments:

  • A function type whose parameters match the columns from the input table that we’re interested in using. In this case the input table is the Product table and it has one text column called Fruit; we want to pass the contents of the Fruit column to this function.
  • A function that is passed a table (in this case called InputTable) that has the same columns as those listed in the function type. The documentation says this function “must not enumerate its input table more than once” so it uses Table.Buffer() to make sure this is the case; it then takes the contents of the Fruit column from this table and passes it to the ListUpper function defined above, and returns a list of upper case text values.

Basically, it creates a function that looks like it is being called on a row-by-row basis but which is actually called once for the whole table.

The result of calling this function on the table returned by the Product query:

Table.AddColumn(
Product,
“Fruit Upper”,
each ScalarVectorUpper([Fruit])
)

…is the same as before, but potentially a lot more efficient:

image

This may seem obscure, but personally I think this is incredibly useful and important. Anyone writing a custom connector in the future (although Function.ScalarVector() isn’t available in the current release of the Power Query SDK I assume it will appear soon) is likely to want to use it, and remember also that custom connectors can be used to create M function libraries and I believe this scenario will be officially supported too.

You can download the example pbix file for this post here.

19 thoughts on “Using Function.ScalarVector() To Optimise Calls To M Functions

  1. Hi Chris, very interesting topic. The how-to is a bit too convoluted for what I believe should be a more straightforward feature…
    Have you tested whether this would have performance benefits locally vs regular function invocation?

  2. I think the point is that it does not have any inherent performance benefits, but it does allow you to implement your functions in a more efficient way than would otherwise be possible.

  3. Hi Chris

    I have a Query Called FOLDER Defined as below

    let
    Source = Folder.Files(“C:\DATA”),
    mRemOthCols = Table.SelectColumns(Source,{“Folder Path”, “Name”}),
    mAddCustCol = Table.Combine(Table.AddColumn(mRemOthCols, “Custom”, each fGetDataSht([Folder Path],[Name]))[Custom])
    in
    mAddCustCol

    i have a Function called fGetDataSht defined as below

    let
    fGetDataSht=(filePath as text, fName as text)=>
    let
    Source = Excel.Workbook(File.Contents(filePath & fName), true, true){0}[Data]
    in
    Source
    in
    fGetDataSht

    Currently the function is getting invoked as many times as there are files in the Folder.
    How can we use Function.ScalarVector in this case

    1. You could rewrite your function using Function.ScalarVector but what do you want to achieve here? Unlike a web service there’s no way you can avoid opening Excel files individually. There might be some improvements possible if you’re getting data from multiple sheets in the same workbook though.

  4. @Chris – I tried doing this, but i am doing something wrong. Can you kindly look at the below

    Function.ScalarVector(
    type function(filePath as any,fName as any) as table,
    (InputTable) =>
    let
    BufferTable = Table.Buffer(InputTable),
    CallFunction = fGetDataSht(each BufferTable[Folder Path]{_},each BufferTable[Name]{_})
    in
    CallFunction
    )

      1. To be clear: the aim here is to speed up reading data from multiple sheets in the same Excel workbook, right? Are you sure Power BI is opening the same workbook multiple times in this scenario (I don’t know – I would have to test if you weren’t sure)?

  5. Hi Chris, when is next power bi meet in London? How can I join it? Thanks & Regards, Bhanu Singh

    Sent from my iPhone

    >

  6. I’ve created a really primitive example at https://gist.github.com/CurtHagenlocher/498d68282f0d355fc55ef7ecb563805b which uses the Cognitive Services languages API to detect language. Even though it uses Table.AddColumn to detect language in a single row, it leverages the vectorization support to make only one API call per 1000 input rows.

    Note that I’m only buffering each page of 1000 values instead of buffering the entire column. This avoids having to enumerate the input table twice. Most of the magic in the implementation is there to let us stitch together multiple enumerations to avoid having to buffer too much. To take advantage of this streaming capability, you need to use something like List.Split to separate a list into a streaming-capable list of lists and then List.Combine to produce a single list from the output.

  7. @Chris – Could you kindly point out how consolidation of data in multiple sheets of a single file can be optimised using Function ScalarVector

  8. I built a very small function to assist developers in identifying each row’s position within each “batch” of data loaded (or previewed), and therefore you can deduce the size of each batch:

    = Function.ScalarVector(
    type function(Any as any) as number,
    (InputTable as table) as list => List.Positions(InputTable[Any])
    )

    For example, in a large (unbuffered) table, within the Power Query Editor’s preview, I was able to see the first 1,000 rows were processed, and as I scrolled down past the 1,000th record, it seemed 50 rows were processed at a time. Maybe this would be useful in debugging complex queries with streaming datasets!

    -Alex

  9. Hi Chris this has been a very useful post for a complex problem I was facing for the past week and I would like to share something interesting of the approach you showed here.

    My Problem: Retrieve data from a Web Service for an Object and apply a complex custom function (14 arguments) to transform the data in a table for each object (the table has the same structure for all the objects)

    Limitation: API Web Service uses a unique URL for object of interest. This means for any list of objects a individual API call needs to be executed.

    Not so good solution: Table with listed Objects and corresponding URL. Invoke custom function using the Add Column functionality and then expand the resulting table.
    Performance: Several hours to load in the report the query that contains several 100,000s rows. Also when looking at the Power Query diagnostic report i had 1000s of rows.

    New Solution: At first I was not sure how to implement your example here, specially the web has very limited information in this “obscure” Function.ScalarVector. The key wall i hit was how to pass multiple arguments to my function but then i managed to make it work by actually creating a column in m table where each row contains the parameters of my function for each object.
    New Performance: Loading time in the report has been reduced drastically, to about a hour and a half. Also when looking at the diagnostics, the number of rows is in the order of 400s which reflects that the data is not being accessed several times nor that certain steps are being repeated as was the case in the previous approach

    1) First step was to convert my complex function

    let MyFunction = (Arg1 as text, Arg2 as number, … Arg14) =>
    let
    function = Anything
    in
    function
    in
    MyFunction

    into a function that can take list as arguments

    let MyFunction_ListArg = (Parameters as List) =>
    let
    function = List.Transform(Parameters, (_) => MyFunction(_{0},_{1},…,{14}))
    in
    in
    MyFunction_ListArg

    2) Create the ScalarVectorMyFunction
    let ScalarVectorMyFunction = Function.ScalarVector(type function(Parameters as text) as text,
    (InputTable) =>
    let
    BufferTable = Table.Buffer(InputTable),
    InputList = BufferTable[Parameter],
    CallFunction = ListUpper(InputList)
    in
    CallFunction)

    The next thing that would make this function fully modular is that the name for the column used in the Function.ScalarVector to be dynamic. It is hardcoded to match the same column name in my table.
    If any ideas how to sort this out that will be great.

    Cheers
    J

Leave a Reply to samCancel reply