Parquet Files In Power BI/Power Query And The “Streamed Binary Values” Error

If you’re using the new Parquet connector in Power BI there’s a chance you will run into the following error:

Parameter.Error: Parquet.Document cannot be used with streamed binary values.
Details:
[Binary]

This isn’t a bug or anything that can be fixed, so it’s important to understand why it occurs and what you can do about it.

One easy way to reproduce this problem is by trying to access a reasonably large (larger than a few MB) Parquet file stored in SharePoint, something like this:

let
  Source = SharePoint.Files(
    "https://microsoft-my.sharepoint.com/personal/abc",
    [ApiVersion = 15]
  ),
  GetFile = Source
    {
      [
        Name = "myfile.parquet",
        #"Folder Path"
          = "https://microsoft-my.sharepoint.com/personal/abc/Documents/"
      ]
    }
    [Content],
  #"Imported Parquet" = Parquet.Document(
    GetFile
  )
in
  #"Imported Parquet"

The problem is that reading data from Parquet files requires random file access, and this is something that isn’t possible in Power Query for certain data sources like SharePoint and Google Cloud Storage. This problem will never occur with locally-stored files or files stored in ADLSgen2.

There is one possible workaround but it comes with some serious limitations: buffer the Parquet file in memory using the Binary.Buffer() M function. Here’s an example of how the above query can be rewritten to do this:

let
  Source = SharePoint.Files(
    "https://microsoft-my.sharepoint.com/personal/abc",
    [ApiVersion = 15]
  ),
  GetFile = Source
    {
      [
        Name = "myfile.parquet",
        #"Folder Path"
          = "https://microsoft-my.sharepoint.com/personal/abc/Documents/"
      ]
    }
    [Content],
  #"Imported Parquet" = Parquet.Document(
    Binary.Buffer(GetFile)
  )
in
  #"Imported Parquet"

The problem with buffering files in memory like this is that it’s only feasible for fairly small files because of the limits on the amount of memory Power Query can use (see here for more information): you’re likely to get really bad performance or errors if you try to buffer files that are too large, and Parquet files are often fairly large. The best way of solving this problem is to switch to using a data source like ADLSgen2 where this problem will not happen.

[Thanks to Eric Gorelik for the information in this post]

5 responses

    • All I know is that it’s down to file random access. There’s a big difference between having a file system where you can access the contents of a file easily and a cloud service that can only give you a binary.

  1. Pingback: Power BI: Parquet Files and Streamed Binary Values Error – Curated SQL

  2. Actually cloud/http streams like Google Cloud Storage or AWS S3 do support random access if the client can make use of the Range header – see e.g. for GCP, https://cloud.google.com/storage/docs/xml-api/reference-headers#range
    The challenge here seems that Web.Content and Parquet.Document do not allow the use of Range on-the-fly.
    Ideally, Parquet.Document() should accept an url too and read the bytes as necessary though the Web/http interface.

  3. Thanks for posting this, it’s a problem we’re having using Parquet on AWS S3 and connecting to Power BI via an API Gateway. Our current workaround is to save as csv instead which is not ideal. Do you know if Microsoft is going to address this at some stage for sources other than ADLSgen2?

Leave a Reply to Laurie Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: