An In-Depth Look At The Csv.Document M Function

Chris Webb Analysis Services, Azure Analysis Services, M, Power BI, Power Query March 9, 2018January 18, 2020 12 Minutes

CSV files are one of the most commonly used data sources in Power BI and Power Query/Get&Transform, and yet the documentation for the Csv.Document M function is very limited and in some cases incorrect. In this rather long post I’ll show you as many of the capabilities of this useful function as I’ve been able to discover.

The Source parameter

The Csv.Document function returns a table, and the first (and only non-optional) parameter of this function is the source data in CSV format. Normally this is a binary value returned by the File.Contents function. For example take the this simple CSV file with no column headers and one row of data:

The following M code uses File.Contents to read the contents of the file, and then passes the contents to Csv.Document to be interpreted as a CSV file:

[sourcecode language=”text” padlinenumbers=”true”]
let
Source = File.Contents("C:\CSVTests\SourceFile.csv"),
ToCSV = Csv.Document(Source)
in
ToCSV
[/sourcecode]

The output is this:

However it is also possible to pass text to the first parameter of Csv.Document too, for example:

[sourcecode language=”text”]
let
SourceText = "February,Oranges,2",
ToCSV = Csv.Document(SourceText)
in
ToCSV
[/sourcecode]

The output of this query is:

In both of these examples I’m relying on the default behaviour of the Csv.Document function with regard to delimiters and other properties, which I’ll explain in more detail below.

Using a record in the second parameter

The second parameter of Csv.Document can be used in several different ways. In code generated by the Query Editor UI it usually takes the form of a record, and the different fields in the record specify how the function behaves in different scenarios. For example, if you connect to the CSV file shown above by selecting the Text/CSV source in the Query Editor UI, you’ll see the following window appear showing a preview of the data and three options:

This results in the following M query:

[sourcecode language=”text” padlinenumbers=”true” highlight=”4,5,6,7,8,9″]
let
Source =
Csv.Document(
File.Contents("C:\CSVTests\SourceFile.csv"),
[
Delimiter=",",
Columns=3,
Encoding=1252,
QuoteStyle=QuoteStyle.None
]),
#"Changed Type" =
Table.TransformColumnTypes(
Source,
{
{"Column1", type text},
{"Column2", type text},
{"Column3", Int64.Type}
})
in
#"Changed Type"
[/sourcecode]

The query above shows the Csv.Document function with a record in its second parameter containing four fields: Delimiter, Columns, Encoding and QuoteStyle. There is also a fifth field that can be added to the record, CsvStyle, but this cannot be set anywhere in the UI.

The Data Type Detection option shown in the screenshot gives you three options for detecting the data types in each column of your file: by default it looks at the first 200 rows in the dataset, but you can also ask it to look at the entire dataset (which may be slower) or not to detect data types at all, in which case it will treat all columns as text. In this case data types are not set in the Csv.Document function but in the #”Changed Type” step with the Table.TransformColumnTypes function, but as we will see later it is possible to set column names and data types in a single step with Csv.Document instead.

The Encoding field

The File Origin dropdown menu shown above corresponds to the Encoding field in the Csv.Document function. This integer value specifies the code page used to encode the contents of the file:

In the M query in the previous section the 1252 code page is set explicitly. The following M query sets the (incorrect) 1200 code page for the CSV file shown above:

[sourcecode language=”text”]
let
Source = File.Contents("C:\CSVTests\SourceFile.csv"),
ToCSV = Csv.Document(Source,[Encoding=1200])
in
ToCSV
[/sourcecode]

…with the following result:

The Delimiter field

The Delimiter dropdown allows you to specify the delimiter used to separate the columns in each row of data. There are a number of options available through the UI, including commas and tabs, and the Custom option allows you to enter your own delimiter:

If you specify a single character delimiter at this point then the Delimiter field of the record in the second parameter of Csv.Document is set; the Custom and Fixed Width options shown here use a different form of the Csv.Document function described below. If the Delimiter record is not set then a comma is used as the delimiter. If you want to use a special character like a tab then you need to use an M escape sequence; for example to use a tab character as delimiter you need to use the text “#(tab)” which returns a text value containing just a single tab character.

For example, the following query:

[sourcecode language=”text”]
let
Source = "123a456a789",
ToCSV = Csv.Document(Source, [Delimiter="a"])
in
ToCSV
[/sourcecode]

Returns:

And this query:

[sourcecode language=”text”]
let
Source = "789#(tab)456#(tab)123",
ToCSV = Csv.Document(Source, [Delimiter="#(tab)"])
in
ToCSV
[/sourcecode]

Returns:

The Columns field

The Columns field specifies the number of columns in the table returned by Csv.Document, regardless of how many columns are actually present in the source data. For example, the following query:

[sourcecode language=”text”]
let
Source = "a,b,c",
ToCSV = Csv.Document(Source, [Delimiter=",", Columns=3])
in
ToCSV
[/sourcecode]

…returns a table with three columns:

While the following query returns a table with four columns, even though only three columns are present in the data:

[sourcecode language=”text”]
let
Source = "a,b,c",
ToCSV = Csv.Document(Source, [Delimiter=",", Columns=4])
in
ToCSV
[/sourcecode]

And the following query returns a table with only two columns, discarding the third column of data present in the data:

[sourcecode language=”text”]
let
Source = "a,b,c",
ToCSV = Csv.Document(Source, [Delimiter=",", Columns=2])
in
ToCSV
[/sourcecode]

The Columns field is not explicitly set by the user when you first connect to a CSV file via the UI, but the UI infers its value from the number of columns it finds in the CSV file. This can be a problem if the number of columns in the source data changes in the future because, as shown above, when the Columns field is set the table returned always has a fixed number of columns. As a result if the number of columns in the data source increases in the future you will find columns on the right-hand side of the table are not returned; similarly if the number of columns decreases you’ll see unwanted empty columns. Several people such as Prathy Kamasani have blogged about this problem and it may be better to delete the Columns field from the record, or not set the Columns field in the first place, in order to avoid it. If you do not set the Columns field then Csv.Document returns a table with the number of columns that are present in the first row of your source data.

The QuoteStyle field

The QuoteStyle field can take two possible values of type QuoteStyle: QuoteStyle.None and QuoteStyle.Csv. Here’s what the built-in documentation has to say about the QuoteStyle type:

While the value for QuoteStyle is set automatically when you connect to a file, if you edit a step in the Query Editor that uses Csv.Document you can change this value in the UI in the Line Breaks dropdown shown here:

As the screenshot above suggests this field controls whether line breaks inside text values are respected. For both QuoteStyle.None and QuoteStyle.Csv, if you wrap a text value inside double quotes those double quotes are used to indicate the start and the end of the text value and are not shown in the output; if you want a double quote to appear, you have to double it up. However if QuoteStyle.None is set then line breaks are always respected, even if they appear inside double quotes; if QuoteStyle.Csv is set, then line breaks inside double quotes are ignored. Take the following CSV file for example:

The following M query, using QuoteStyle.None:

[sourcecode language=”text”]
let
Source = File.Contents("C:\CSVTests\SourceFileWithQuotes.csv"),
ToCSV = Csv.Document(Source,[QuoteStyle=QuoteStyle.None])
in
ToCSV
[/sourcecode]

…returns the following table with two rows in it:

Whereas the following M query, using QuoteStyle.Csv:

[sourcecode language=”text”]
let
Source = File.Contents("C:\CSVTests\SourceFileWithQuotes.csv"),
ToCSV = Csv.Document(Source,[QuoteStyle=QuoteStyle.Csv])
in
ToCSV
[/sourcecode]

…returns a table with just one row, and a line break present in the text value in the first column:

The CsvStyle field

The final field that can be used, CsvStyle, is also related to quotes. It can take one of two values of type CsvStyle: Csv.QuoteAfterDelimiter and CsvStyle.QuoteAlways.

If the CsvStyle field is not set, the default is CsvStyle.QuoteAlways. Consider the following CSV file:

Notice that on the second line there is a space after the comma. The following M query:

[sourcecode language=”text” padlinenumbers=”true”]
let
Source =
File.Contents("C:\CSVTests\SourceFileWithQuotes.csv"),
ToCSV =
Csv.Document(
Source,
[CsvStyle=CsvStyle.QuoteAlways])
in
ToCSV
[/sourcecode]

Returns this, because the space before the comma is not treated as significant:

Whereas the following M query:

[sourcecode language=”text”]
let
Source =
File.Contents("C:\CSVTests\SourceFileWithQuotes.csv"),
ToCSV =
Csv.Document(
Source,
[CsvStyle=CsvStyle.QuoteAfterDelimiter])
in
ToCSV
[/sourcecode]

Returns the text “four” in double quotes on the second line, because the space before the comma on the second line changes how the double quotes are treated:

Using a list or a table type in the second parameter

If the first line of your CSV file contains column headers and you connect to the file using the Query Editor user interface, in most cases this will be detected and an extra step will be added to your query that uses Table.PromoteHeaders to use these values as the column headers. However if you don’t have column headers inside your CSV file, instead of a record it is also possible to supply a list of column names or even better a table type to define the columns present in your CSV file. When you do this Csv.Document has three other parameters that can be used to do some of the same things that are possible if you use a record in the second parameter – Delimiter, ExtraValues and Encoding – and they are described below.

For example, in the following CSV file there are three columns: Month, Product and Sales.

Using a list of text values containing these column names in the second parameter of Csv.Document, as in the following M query:

[sourcecode language=”text”]
let
Source = File.Contents("C:\CSVTests\SourceFile.csv"),
ToCSV = Csv.Document(Source,{"Month","Product","Sales"})
in
ToCSV
[/sourcecode]

Returns the following table:

This has set the column names correctly but the data types of the three columns are set to text. What if I know that only the Month and Product columns contain text and the Sales column should be a number? Instead of a list of column names, using a table type allows you to set names and data types for each column:

[sourcecode language=”text”]
let
Source = File.Contents("C:\CSVTests\SourceFile.csv"),
ToCSV = Csv.Document(
Source,
type table
[#"Month"=text, #"Product"=text, #"Sales"=number])
in
ToCSV
[/sourcecode]

Notice how now the Sales column has its data type set to number.

The Delimiter parameter

If you have used a list of column names or a table type in the second parameter of Csv.Document, you can use the third parameter to control how each row of data is split up into columns. There are two ways you can do this.

First of all, you can pass any piece of text to the third parameter to specify a delimiter. Unlike the delimiter field of the second parameter described above, this can be a single character or multiple characters. For example, the following M query:

[sourcecode language=”text” padlinenumbers=”true”]
let
Source = "abcdefg",
ToCSV = Csv.Document(Source,{"first","second"},"c")
in
ToCSV
[/sourcecode]

Returns:

And the following M query:

[sourcecode language=”text”]
let
Source = "abcdefg",
ToCSV = Csv.Document(Source,{"first","second"},"cd")
in
ToCSV
[/sourcecode]

Returns:

Instead of text, the Delimiter parameter can also take a list of integer values to allow you to handle fixed-width files. This functionality is available from the UI when you choose the Fixed Width option from the Delimiter dropdown box when you connect to a CSV file for the first time:

Each integer in the list represents the number of characters from the start of the row that marks the start of each column; as a result, each integer in the list has to be a larger than the preceding integer. The values are 0-based so 0 marks the start of a row. For example, the M query:

[sourcecode language=”text”]
let
Source = "abcdefg",
ToCSV = Csv.Document(Source,{"first","second","third"},{0,3,5})
in
ToCSV
[/sourcecode]

Returns:

The ExtraValues parameter

The ExtraValues parameter allows you to handle scenarios where there are extra columns on the end of lines. This isn’t quite as useful as it sounds though: most of the time when the number of columns varies in a CSV file it’s because there are unquoted line breaks in text columns, in which case you should make sure your source data always wraps text in double quotes and use the QuoteStyle option described above, or if you can’t fix your data source, see this post.

The ExtraValues parameter can take one of three values of type ExtraValues: ExtraValues.List, ExtraValues.Ignore and ExtraValues.Error.

Consider the following CSV file with two extra columns on the second row:

The following query reads data from this file:

[sourcecode language=”text” padlinenumbers=”true”]
let
Source = File.Contents("C:\CSVTests\SourceFile.csv"),
ToCSV = Csv.Document(Source,{"Month","Product","Sales"})
in
ToCSV
[/sourcecode]

As you can see from the screenshot below, because we have specified that there are three columns in the table, the error “There were more columns in the result than expected” is returned for each cell on the second line:

The same thing happens when ExtraValues.Error is explicitly specified in the fourth parameter, like so:

[sourcecode language=”text”]
let
Source = File.Contents("C:\CSVTests\SourceFile.csv"),
ToCSV =
Csv.Document(
Source,
{"Month","Product","Sales"},
",",
ExtraValues.Error
)
in
ToCSV
[/sourcecode]

If you set ExtraValues.Ignore instead, though:

[sourcecode language=”text”]
let
Source = File.Contents("C:\CSVTests\SourceFile.csv"),
ToCSV =
Csv.Document(
Source,
{"Month","Product","Sales"},
",",
ExtraValues.Ignore
)
in
ToCSV
[/sourcecode]

The extra columns are ignored and no errors are returned:

Setting ExtraValues.List allows you to capture any extra column values in a list; however, if you want to do this you will need to add an extra column to your table to hold these values. For example, notice in this query that four columns rather than three have been defined:

[sourcecode language=”text”]
let
Source = File.Contents("C:\CSVTests\SourceFile.csv"),
ToCSV =
Csv.Document(
Source,
{"Month","Product","Sales","Extra Columns"},
",",
ExtraValues.List)
in
ToCSV
[/sourcecode]

The output looks like this:

On the first and third rows the Extra Columns column contains an empty list. On the second row, however, the Extra Columns column contains a list containing two values – the two values from the two extra columns on that line.

The Encoding parameter

The Encoding parameter corresponds directly to the Encoding field used when you pass a record to the second parameter, as described above. The one difference is that it can take an integer or a value of type TextEncoding, although the TextEncoding data type only contains values for some of the more common code pages so the only reason to use it would be for readability:

As a result, the following two M queries:

[sourcecode language=”text”]
let
Source = File.Contents("C:\CSVTests\SourceFile.csv"),
ToCSV = Csv.Document(
Source,
{"Month","Product","Sales"},
",",
ExtraValues.Ignore,
TextEncoding.Windows
)
in
ToCSV
[/sourcecode]

…return the same result.

What about CsvStyle and QuoteStyle?

If you specify a list of column names or a table type in the second parameter of Csv.Document there’s no way to set CsvStyle or QuoteStyle – these options are only available when you use a record in the second parameter. The behaviour you get is the same as CsvStyle.QuoteAlways and QuoteStyle.Csv, so with the following source data:

This M query:

[sourcecode language=”text”]
let
Source = File.Contents("C:\CSVTests\SourceFileWithQuotes.csv"),
ToCSV = Csv.Document(
Source,
{"Month","Sales"},
",",
ExtraValues.Ignore,
1252)
in
ToCSV
[/sourcecode]

returns:

Tagged
Csv.Document

Published by Chris Webb

My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel. View all posts by Chris Webb

Published March 9, 2018January 18, 2020

23 thoughts on “An In-Depth Look At The Csv.Document M Function”

Harj says:

March 9, 2018 at 9:50 am

The title was correct – it’s very in-depth. Great post 🙂

Loading...

Reply
Anthony says:

March 12, 2018 at 8:41 am

Great post.
It is this kind of documentation that we should get from MS from the begining. Thank you.

Loading...

Reply
Cassian Rozario says:

March 13, 2018 at 9:20 am

Dear Sir,
Your help is needed.
Whenever I try to manage a relationship in two files between two columns I get the message
“You can’t create a relationship between these two columns because one of the columns must have unique values.”
Please note that the values are truly unique in the columns but somehow the above message is displayed the moment I select even one column.
Then a black error screen appears and the Power BI application powers off.
regards

Loading...

Reply
1. Chris Webb says:
  
  March 14, 2018 at 8:44 pm
  
  This sounds like a bug in Power BI unfortunately. You should report it at https://community.powerbi.com/t5/Issues/idb-p/Issues
  
  Loading...
  
  Reply
Pingback: The Binary.InferContentType M Function – Chris Webb's BI Blog
AngelaW says:

April 24, 2019 at 5:00 pm

Thank you for the article. I’ve been searching and searching…and searching for the different source methods to use for ‘Get Data’ from SharePoint. My files are stored in a Document Library (csv files) on SharePoint. Via Desktop you can always refresh with no issues but the issue arises when Publishing to the Service IF you aren’t connecting via SharePoint Online. Yet nowhere can I find documentation on how to connect to SP Online (except for numerous articles on a SharePoint List). I don’t have a list. I have files under a folder structure and the point and click via Get Data for Online Services doesn’t work with that.
What I’ve discovered is there is way – but no walk-thru (Point and Click method via Get Data) instructions out there. You have to know this M Language!!
Your example above is using ” Source = File.Contents(“C:\CSVTests…”) which means it’s pointing to your internal C drive for remembering where to go. This does not work when publishing to the Service – and then apparently a Gateway is needed (and if you’re in a company like mine, IT has locked up those gateways!)
I was using similar to your above which was using my C drive:
“Source = Csv.Document(File.Contents(“C:\Users\ThisIsMe\MyCompany\EE Global PMO – PS Document Library\Metric_Collection_ROI.csv”),[Delimiter=”,”, Columns=35, Encoding=1252, QuoteStyle=QuoteStyle.Csv]),”

What seems to work (allowing me to publish to the Service and the SharePoint Refresh works without a Gateway is this:
” Source = SharePoint.Files(“https://MyCompanyName.sharepoint.com/sites/EEGlobalPMO”, [ApiVersion = 15]),
#”Metric Collection” = Source{[Name=”Metric_Collection_ROI.csv”,#”Folder Path”=”https://MyCompanyName.sharepoint.com/sites/EEGlobalPMO/PS Document Library/”]}[Content],
#”Imported CSV” = Csv.Document(#”Metric Collection”,[Delimiter=”,”, Columns=35, Encoding=1252, QuoteStyle=QuoteStyle.Csv]),

I honestly can’t tell you how I figured this out (Since I don’t know “M”) and I’m not sure this is how I should be getting the data – but it works, the SharePoint Refresh in Services likes it! More articles and insights on the Source methods with the Csv.document would be greatly appreciated.

Loading...

Reply
Damian says:

June 18, 2019 at 11:11 pm

Hi Chris,
Grate article thanks! One question please,

How can I pass a table or list as a source parameter?

Thank you!

Loading...

Reply
1. Chris Webb says:
  
  July 4, 2019 at 9:57 am
  
  As a parameter to Csv.Document? You can’t.
  
  Loading...
  
  Reply
veemvve says:

July 11, 2019 at 6:31 pm

Great post!!!

I have a question, my source data has new line chars as part of Column (Comment as shown below) values and I followed table type as part of 2nd parameter. For few records the new line chars are ignored and for few its considered and data gets scrambled.

let
Source = #”SrcConnection”,
#”tbl1 csv” = Source{[Name=”tbl.csv”]}[Content],
#”Imported CSV” = Csv.Document(#”tbl1 csv”, type table [#”AuditCreatedBy”=text,#”AuditCreatedDate”=datetime,#”AuditModifiedBy”=text,#”AuditModifiedDate”=datetime,#”Comment”=text,#”Id”=Int64.Type], “`”, ExtraValues.Ignore, 1252),
#”Promoted Headers” = Table.PromoteHeaders(#”Imported CSV”, [PromoteAllScalars=true])
in
#”Promoted Headers”

Loading...

Reply
Jason D. says:

April 6, 2020 at 5:43 pm

Has the “type table” function been deprecated? I cannot get it to work. Every time I use the following syntax:

type table
[#”ColumnName”=text]

I get the this error in the query editor:

Expression.Error: We cannot convert Type to Text type.
Details:
Value=[Type]
Type=[Type]

Loading...

Reply
1. Chris Webb says:
  
  April 6, 2020 at 7:32 pm
  
  It hasn’t been deprecated, no. There must be something wrong elsewhere in your code…?
  
  Loading...
  
  Reply
Joe Romero says:

April 20, 2020 at 8:22 am

Hi Chris, loved the article.

Have you ever come across a scenario where Power Query creates a line break in a field value instead of a new record for a fixed width delimited text file containing values with double quotes? For example, the first two rows of the dataset below have double quotes:

1 Item 1 12″ thing 40.00 Orange
2 Item 2 15″ thing 60.00 Blue
3 Item 3 Other thing 12.99 Yellow
4 Item 4 Other thing #2 11.00 Green

I’m getting some bizarre results when I run the following script:
let
Source =
Csv.Document(
File.Contents( “D:\Users\Documents\myTextFile.txt”),
4,
{0,16,37,47},
null,
65001
)
in
Source

Results:
Col1 Col2 Col3 Col4
1 Item 1 12″ thing 40.00 Orange
Item 2 15″ thing 60.00 Blue
2 Item 3 Other thing 12.99 Yellow
3 Item 4 Other thing #2 11.00 Green

In the first record, Col4 value has a line break, which is followed by Item 2 and all of it’s attribute values – all in one “cell”. It’s like PQ doesn’t see the line break as a new record when double quotes are present. I know by passing in a record for the columns argument you can set the QuoteStyle.None, but passing in a record isn’t possible with fixed width delimiters since the Delimiter in the record only accepts a text value, not a list of positional split numbers.

I’ve had a work around. It’s basically bringing in as 1 column, no delimiters, and running a Table.Split function. However, the processing time seems to be much longer (processing several millions of rows). I’m wondering if this is something you’ve ever encountered, if it’s just a bug, or if I’m missing something. Thanks!

Loading...

Reply
1. Chris Webb says:
  
  April 20, 2020 at 4:08 pm
  
  I haven’t seen this, no – it sounds like you have explored all the options there are to explore, so maybe it’s a bug?
  
  Loading...
  
  Reply
Neal Robinson says:

May 14, 2020 at 1:10 pm

Chris . . incredibly incredibly helpful . . if you could extend this standard of documentation to the remainder of Microsoft estate (sic) . . . . !!

Loading...

Reply
john says:

February 4, 2021 at 11:50 am

Thank you Chris for this and your continuing series of tips and hints and insights.
Is there a way to enforce no delimiter?
I don’t want the default comma or anything else!

Loading...

Reply
1. Chris Webb says:
  
  February 4, 2021 at 3:34 pm
  
  What do you mean by no delimiter?
  
  Loading...
  
  Reply
Jerry Kuo says:

April 9, 2021 at 5:27 am

Many thanks for your kindly post ! Is there any way to filter csv file before loading data ?

Loading...

Reply
1. Chris Webb says:
  
  April 9, 2021 at 9:31 am
  
  Yes, that’s exactly what Power Query allows you to do
  
  Loading...
  
  Reply
Ryan says:

July 26, 2021 at 9:27 pm

How would you parse out a string in TSV format? Im trying to quickly format output from a Snowflake query using Power Query.

Loading...

Reply
Todor says:

November 3, 2022 at 2:38 pm

HI, Chris!
Would be possible to vary with delimiter? In case that we have lots of .csv files with different exports – one part with comma, other with #(tab) we need to identify what kind of separator need to be used to load it. It’s quite the same with other parameters.

Best regards
Todor

Loading...

Reply
Rick de Groot says:

May 28, 2023 at 1:33 pm

Chris,

Happy to see such elaborate documentation on the lesser known functions. I’m trying to figure out the difference for QuoteStyle.Csv and QuoteStyle.None.

I tried replicating your example, but unfortunately didn’t get it too work.

Then cooked up below example, and here too it swapping around the different quotestyle does not have an impact.

= Csv.Document("Name,Age,Location#(lf)""John, Doe"",30,New York#(lf)""Jane, Doe"",28,""""Los Angeles, California", [Delimiter=",", Columns=3, Encoding=65001, QuoteStyle=QuoteStyle.Csv ])

How can we revise the code so it can illustrate the different behaviors?

Thanks, Rick

Loading...

Reply
1. Rick de Groot says:
  
  May 30, 2023 at 1:50 pm
  
  Chris, In the meanwhile I managed to solve this. The following snippets give different results:
  
  = Lines.FromText( "This is a ""string, in quotes""", QuoteStyle.Csv )
  
  vs
  
  = Lines.FromText( "This is a ""string, in quotes""", QuoteStyle.None)
  
  Explained it in more detail here.
  https://powerquery.how/quotestyle-csv/
  
  Also fun to know. I’m intending to include quite some of your articles in the powerquery.how documentation. Your website is an extensive research, and I’m not intending to write new articles for where the are already good ones.
  
  Cheers, Rick
  
  Loading...
  
  Reply
Pingback: Крис Уэбб. Функция Csv.Document M Power Query