Parquet Files In Power BI/Power Query And The “Streamed Binary Values” Error

If you’re using the new Parquet connector in Power BI there’s a chance you will run into the following error:

Parameter.Error: Parquet.Document cannot be used with streamed binary values.
Details:
[Binary]

This isn’t a bug or anything that can be fixed, so it’s important to understand why it occurs and what you can do about it.

One easy way to reproduce this problem is by trying to access a reasonably large (larger than a few MB) Parquet file stored in SharePoint, something like this:

let
  Source = SharePoint.Files(
    "https://microsoft-my.sharepoint.com/personal/abc",
    [ApiVersion = 15]
  ),
  GetFile = Source
    {
      [
        Name = "myfile.parquet",
        #"Folder Path"
          = "https://microsoft-my.sharepoint.com/personal/abc/Documents/"
      ]
    }
    [Content],
  #"Imported Parquet" = Parquet.Document(
    GetFile
  )
in
  #"Imported Parquet"

The problem is that reading data from Parquet files requires random file access, and this is something that isn’t possible in Power Query for certain data sources like SharePoint and Google Cloud Storage. This problem will never occur with locally-stored files or files stored in ADLSgen2.

There is one possible workaround but it comes with some serious limitations: buffer the Parquet file in memory using the Binary.Buffer() M function. Here’s an example of how the above query can be rewritten to do this:

let
  Source = SharePoint.Files(
    "https://microsoft-my.sharepoint.com/personal/abc",
    [ApiVersion = 15]
  ),
  GetFile = Source
    {
      [
        Name = "myfile.parquet",
        #"Folder Path"
          = "https://microsoft-my.sharepoint.com/personal/abc/Documents/"
      ]
    }
    [Content],
  #"Imported Parquet" = Parquet.Document(
    Binary.Buffer(GetFile)
  )
in
  #"Imported Parquet"

The problem with buffering files in memory like this is that it’s only feasible for fairly small files because of the limits on the amount of memory Power Query can use (see here for more information): you’re likely to get really bad performance or errors if you try to buffer files that are too large, and Parquet files are often fairly large. The best way of solving this problem is to switch to using a data source like ADLSgen2 where this problem will not happen.

[Thanks to Eric Gorelik for the information in this post]

3 responses

    • All I know is that it’s down to file random access. There’s a big difference between having a file system where you can access the contents of a file easily and a cloud service that can only give you a binary.

  1. Pingback: Power BI: Parquet Files and Streamed Binary Values Error – Curated SQL

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: