More Details On Creating Tables In Power BI/Power Query M Code Using #table()

About two years ago I wrote a blog post describing how the #table M function can be used to generate tables, but in that post I only covered the functionality I used regularly – namely using #table with a list of column names or a table type in the first parameter. However there two other variations on #table that I have used recently that I thought were worth pointing out.

For example, if you need to generate a table with a set number of columns but you don’t care what the columns are called, you can use an integer in the first parameter to get a table with that number of columns. The following expression returns a table with four columns of data type Any called Column1, Column2, Column3 and Column3, and no rows:

#table(4,{})

image

Also, if you have a list of lists with an unknown number of items in and you want to use each nested list for the row values in a table, you can use a null value in the first parameter of #table. The following expression returns a table with four columns like the one above, but with two rows of integer values:

#table(null, {{1,2,3,4},{2,3,4,5}})

image

How To Tell Whether Query Folding Is Taking Place When Importing Data From Analysis Services In Power BI And Excel Power Query

As a quick follow-on from last week’s post on how to detect whether query folding is taking place when importing from OData data sources, if you’re importing data from Analysis Services you have a similar problem: how do you know whether query folding is taking place? Ensuring that query folding takes place for as many of the steps in your query – especially those that filter or otherwise reduce the amount of data returned – is very important for data refresh performance.

Although the Power Query engine generates MDX queries when importing from Analysis Services in the same way it generates SQL queries when it imports from a relational database, the View Native Query option doesn’t work for Analysis Services data sources. You can of course use a Profiler trace or xEvents to see the MDX, but for most users that will not be an option for security reasons. The UI does tell you in another way though. If query folding is taking place for a step, then the Cube Tools menu on the ribbon (with the Add Items and Collapse Columns buttons) will be available, and in the top right-hand corner of the table in the results area there will be a cube icon:

image

If query folding is not taking place for a step (even though it might be taking place for previous steps in the query) then the Cube Tools menu will not be visible, and the cube icon will be replaced by a table icon. For example, in the following screenshot an Index column has been added to the query shown above, so query folding is not taking place from this step on:

image

[Thanks to Jure Jaklic for pointing this out]

Troubleshooting Data Refresh Performance Issues With OData Data Sources In Power BI And Excel Using Fiddler

A lot of people have problems with the performance of OData data sources when loading data into Power BI and Excel. One possible cause of these problems is query folding not taking place – if this is the case then the Power Query engine will be requesting more data that is needed and applying any filters itself locally, rather than requesting filtered data from the data source. How do you know whether query folding is taking place or not though? The Power Query Editor UI doesn’t give you this information, unfortunately. Last week, at the Data and BI Summit in Dublin, Matt Masson demonstrated how to use Fiddler to check if query folding is taking place when loading data from an OData data source and he’s very kindly allowed me to write about what he showed here.

The following examples use the UK Parliament’s public OData API which is documented here. The M query below, generated using the Power Query Editor in Power BI Desktop, returns all the rows from the Government Organisation table from the API:

let
    Source = OData.Feed("https://api.parliament.uk/odata"),
    GovernmentOrganisation_table =
	Source{[
		Name="GovernmentOrganisation",
		Signature="table"
		]}[Data]
in
    GovernmentOrganisation_table

image

To monitor communication between Power BI Desktop and the OData API you will need to install Fiddler, a free tool from Telerik. You can download it here:

https://www.telerik.com/fiddler

You’ll probably also need to configure Fiddler to intercept https traffic, which you can find out how to do here. It’s a very powerful tool and I am by no means an expert in using it, but even a basic understanding of its features can be very useful for a Power BI developer.

With Fiddler running you can add a filter so that it only shows traffic to certain hosts; in this case I’m filtering to only show traffic to api.parliament.uk:

image

Refreshing the query shown above in the Query Editor results in the following activity being shown in Fiddler:

image

There are several calls to the service root URL, but the important call is to get the top 1000 rows from the GovernmentOrganisation table:

https://api.parliament.uk/odata/GovernmentOrganisation?$top=1000

Notice the use of the $top query option to restrict the number of rows returned – the first example of the Power Query engine pushing a filter back to the data source. This only happens when you refresh the query in the Power Query Editor so you can see a sample of the data; when you click the Close and Apply button and load the query into your dataset you’ll see that this $top filter is not applied and all the rows from the table are requested.

Altering the query to filter the rows down to where the GroupName column equals “Cabinet Office” like so:

image

Results in the following M query:

let
    Source = OData.Feed("https://api.parliament.uk/odata"),
    GovernmentOrganisation_table =
	Source{[
		Name="GovernmentOrganisation",
		Signature="table"
	]}[Data],
    #"Filtered Rows" =
	Table.SelectRows(
		GovernmentOrganisation_table,
		each ([GroupName] = "Cabinet Office")
	)
in
    #"Filtered Rows"

Fiddler shows the following call that includes the filter:

image

https://api.parliament.uk/odata/GovernmentOrganisation?$filter=GroupName%20eq%20’Cabinet%20Office’&$top=1000

[The Power Query Editor does quite a lot of caching so you may need to click the Refresh button on the ribbon to make sure it actually does call the API]

It can be quite difficult to work out what’s going on in a call like this, so with the highlighted row in the screenshot above selected you can go to the Inspectors tab and then the WebForms sub-tab, and it will show the different query options used:

image

In this case you can see the $filter query option has been used to do the filter, so query folding has taken place in this case and the API is only returning the one row that the query returns:

image

However it is all too easy to do something in your query that prevents query folding from happening. For example if you add an index column to the table before filtering by GroupName, as shown in this query:

let
    Source = OData.Feed("https://api.parliament.uk/odata"),
    GovernmentOrganisation_table =
	Source{[
		Name="GovernmentOrganisation",
		Signature="table"
	]}[Data],
    #"Added Index" =
	Table.AddIndexColumn(
		GovernmentOrganisation_table,
		"Index",
		0,
		1),
    #"Filtered Rows" =
		Table.SelectRows(
			#"Added Index",
			each ([GroupName] = "Cabinet Office"))
in
    #"Filtered Rows"

 

image

…then Fiddler shows that only the $top filter is being applied in the call to the API, not the $filter on GroupName, so query folding is no longer taking place for the filter step:

image

image

Not everything you do in your M queries can or will be folded back to an OData API, but in general you should aim to get query folding to take place on the steps in your query that reduce the amount of data returned by the API the most. Row filters like the one shown above are a prime example of the type of transformation that will need to be folded in order to get the best possible data refresh performance. You may need to experiment with reordering steps and applying transformations in different ways to get the Power Query engine to call the API in the way you want.

Reordering Multiple Columns With ‘Remove Other Columns’ In The Power Query Editor

In the comments to my last blog post on how the order that you select columns can affect the output of certain calculations, both Bradley Sawler and John B. Thomas pointed out something very useful that I didn’t know about: that the order you select columns can also be used with the ‘Remove Other Columns’ functionality to reorder columns in bulk.

For example, imagine you have a table with columns called A, B, C, D, E and F. If you select the columns in the order F, E, D, A, B and C and the select ‘Remove Other Columns’, the columns are reordered in the order that you clicked them:

PQReorderCols

As you can see from the demo above, ‘Remove Other Columns’ uses the Table.SelectColumns M function behind the scenes and the order the columns are listed in that function is the order that you have clicked them in. A great trick for reordering a large number of columns quickly!

The Order You Select Columns In The Power Query Editor Can Affect The Output Of Some Transformations

Maybe this is obvious to more experienced Power Query users, but something I always point out when I’m training people up on Power Query is that the order that you select columns in the Power Query Editor window (both in Power BI Desktop and Excel) can affect the output of certain transformations. For example, say you have a table with two columns A and B that both contain numbers; if you select A first and then B, and then go to Add Column/Standard/Divide, you’ll get a new column that contains the value of the calculation A/B. However, if you select B first and then A and do the division you’ll get B/A:

PQDivide

The order that you select columns is also significant for some other types of calculation such as Percent Of and Power, and also when you do a Merge Columns.

Improving The Performance Of Aggregation After A Merge In Power BI And Excel Power Query/Get&Transform

A long time ago someone – probably from the Power Query dev team – told me that adding primary keys to M tables could improve the performance of certain transformations in Power BI and Excel Power Query/Get&Transform. Indeed, I mentioned this in chapter 5 of my book “Power Query for Power BI and Excel”, but I it wasn’t until this week that I found a scenario where this was actually the case. In this post I’ll describe the scenario and try to draw some conclusions about when adding primary keys to tables might make a difference to performance.

Imagine you have two queries that return data from two different files. First of all there’s a query called #”Price Paid”, which reads data from a 140MB csv file containing some of my favourite demo data: all 836933 property transactions in the year 2017 from the UK Land Registry Price Paid dataset. The query does nothing special, just reads the all the data, renames some columns, removes a few others, and sets data types on columns. Here’s what the output looks like:

image

Second, you have a query called #”Property Types” that reads data from a very small Excel file and returns the following table:

image

As you can see, the Property Type column from the #”Price Paid” query contains single letter codes describing the type of property sold in each transaction; the Property Type column from #“Property Types” contains a distinct list of the same codes and acts as a dimension table. Again there’s nothing interesting going on in this query.

The problems start when you try to join data from these two queries using a Merge and then, for each row in #”Property Types”, show the sum of the Price Paid column from #”Price Paid”. The steps to do this are:

1. Click on the Merge Queries/Merge Queries as New button on the Home tab in the Power Query Editor:

image

2. In the Merge dialog, do a left outer join between #”Property Types” and #”Price Paid” on the Property Type column on each table

image

3. Back in the Power Query Editor window, click on the Expand/Aggregate button in the top right-hand corner of the column created by the previous step.

image

4. Click on the Aggregate radio button and select Sum of Price Paid

image

The output of the query is this table:

image

Here’s the M code for this query, as generated by the UI:

let
    Source = 
	Table.NestedJoin(
		#"Property Types",
		{"Property Type"},
		#"Price Paid",
		{"Property Type"},"Price Paid",
		JoinKind.LeftOuter),
    #"Aggregated Price Paid" = 
	Table.AggregateTableColumn(
		Source, 
		"Price Paid", 
		{{"Price Paid", List.Sum, "Sum of Price Paid"}})
in
    #"Aggregated Price Paid"

It’s a very common thing to do, and in this case the query is extremely slow to run: on my laptop, when you refresh the preview in the Power Query Editor window in Power BI Desktop it takes 55 seconds. What’s more, in the bottom left-hand corner of the screen where it displays how much data is being read from the source file, it shows a value that is a lot more than the size of the source file. In fact it seems like the source file for #”Price Paid” is read once for each row in the #”Property Types” query: 140MB * 5 rows = 700MB.

image

Not good. But what if you specify that the Property Type column from the #”Property Types” query is the primary key of the table? After all it does contain unique values. Although it isn’t widely known, and it’s not shown in the UI, tables in M can have primary and foreign keys defined on them whatever data source you use (remember that in this case the data sources are csv and Excel). One way to do this is using Table.AddKey as follows:

let
    WithAddedKey = 
	Table.AddKey(
		#"Property Types", 
		{"Property Type"}, 
		true),
    Source = 
	Table.NestedJoin(
		WithAddedKey,
		{"Property Type"},
		#"Price Paid",
		{"Property Type"},
		"Price Paid",
		JoinKind.LeftOuter),
    #"Aggregated Price Paid" = 
	Table.AggregateTableColumn(
		Source, 
		"Price Paid", 
		{{"Price Paid", List.Sum, "Sum of Price Paid"}})
in
    #"Aggregated Price Paid"

And guess what? After making this change, the query only takes 12 seconds to run instead of 55 seconds! What’s more the amount of data read from disk shown in the UI suggests that the source file for #”Price Paid” is only read once. But making this change involves writing M code, and not everyone is comfortable making changes in the Advanced Editor. The good news is that it’s possible to get this performance benefit in another way without writing any M code.

As I mentioned in this blog post, using the Remove Duplicates transformation on a column has the side-effect of marking that column as a primary key. Therefore if you right-click on the Property Type column and select Remove Duplicates in between steps 2 and 3 above, before you click on the Expand/Aggregate button:

image

…then you also get the performance benefit. Here’s the M code generated by the UI for this query:

let
    Source = 
	Table.NestedJoin(
		#"Property Types",
		{"Property Type"},
		#"Price Paid",
		{"Property Type"},
		"Price Paid",
		JoinKind.LeftOuter),
    #"Removed Duplicates" = 
	Table.Distinct(
		Source, 
		{"Property Type"}),
    #"Aggregated Price Paid" = 
	Table.AggregateTableColumn(
		#"Removed Duplicates", 
		"Price Paid", 
		{{"Price Paid", List.Sum, "Sum of Price Paid"}})
in
    #"Aggregated Price Paid"

It’s #”Removed Duplicates” that is the new step here, and it uses the Table.Distinct function.

All this suggests that if you are doing merge operations on queries that get data from large csv or Excel files and then aggregating data, if you can set a primary key on one of the columns used in the join it can make a massive difference to performance. However this is just one example, and I would be really interested to hear if any of you can reproduce these findings with your own data and queries – if you can, please leave a comment. I suspect this could be a very important discovery for M query performance tuning!

An In-Depth Look At The Csv.Document M Function

CSV files are one of the most commonly used data sources in Power BI and Power Query/Get&Transform, and yet the documentation for the Csv.Document M function is very limited and in some cases incorrect. In this rather long post I’ll show you as many of the capabilities of this useful function as I’ve been able to discover.

The Source parameter

The Csv.Document function returns a table, and the first (and only non-optional) parameter of this function is the source data in CSV format. Normally this is a binary value returned by the File.Contents function. For example take the this simple CSV file with no column headers and one row of data:

image

The following M code uses File.Contents to read the contents of the file, and then passes the contents to Csv.Document to be interpreted as a CSV file:

let
    Source = File.Contents("C:\CSVTests\SourceFile.csv"),
    ToCSV = Csv.Document(Source)
in
    ToCSV

The output is this:

image

However it is also possible to pass text to the first parameter of Csv.Document too, for example:

let
    SourceText = "February,Oranges,2",
    ToCSV = Csv.Document(SourceText)
in
    ToCSV

The output of this query is:

image

In both of these examples I’m relying on the default behaviour of the Csv.Document function with regard to delimiters and other properties, which I’ll explain in more detail below.

Using a record in the second parameter

The second parameter of Csv.Document can be used in several different ways. In code generated by the Query Editor UI it usually takes the form of a record, and the different fields in the record specify how the function behaves in different scenarios. For example, if you connect to the CSV file shown above by selecting the Text/CSV source in the Query Editor UI, you’ll see the following window appear showing a preview of the data and three options:

image

This results in the following M query:

let
    Source = 
	Csv.Document(
		File.Contents("C:\CSVTests\SourceFile.csv"),
		[
		 Delimiter=",", 
		 Columns=3, 
		 Encoding=1252, 
		 QuoteStyle=QuoteStyle.None
		]),
    #"Changed Type" = 
		Table.TransformColumnTypes(
			Source,
			{
			{"Column1", type text}, 
			{"Column2", type text}, 
			{"Column3", Int64.Type}
			})
in
    #"Changed Type"

The query above shows the Csv.Document function with a record in its second parameter containing four fields: Delimiter, Columns, Encoding and QuoteStyle. There is also a fifth field that can be added to the record, CsvStyle, but this cannot be set anywhere in the UI.

The Data Type Detection option shown in the screenshot gives you three options for detecting the data types in each column of your file: by default it looks at the first 200 rows in the dataset, but you can also ask it to look at the entire dataset (which may be slower) or not to detect data types at all, in which case it will treat all columns as text. In this case data types are not set in the Csv.Document function but in the #”Changed Type” step with the Table.TransformColumnTypes function, but as we will see later it is possible to set column names and data types in a single step with Csv.Document instead.

The Encoding field

The File Origin dropdown menu shown above corresponds to the Encoding field in the Csv.Document function. This integer value specifies the code page used to encode the contents of the file:

image

In the M query in the previous section the 1252 code page is set explicitly. The following M query sets the (incorrect) 1200 code page for the CSV file shown above:

let
    Source = File.Contents("C:\CSVTests\SourceFile.csv"),
    ToCSV = Csv.Document(Source,[Encoding=1200])
in
    ToCSV

…with the following result:

image

The Delimiter field

The Delimiter dropdown allows you to specify the delimiter used to separate the columns in each row of data. There are a number of options available through the UI, including commas and tabs, and the Custom option allows you to enter your own delimiter:

image

If you specify a single character delimiter at this point then the Delimiter field of the record in the second parameter of Csv.Document is set; the Custom and Fixed Width options shown here use a different form of the Csv.Document function described below. If the Delimiter record is not set then a comma is used as the delimiter. If you want to use a special character like a tab then you need to use an M escape sequence; for example to use a tab character as delimiter you need to use the text “#(tab)” which returns a text value containing just a single tab character.

For example, the following query:

let
    Source = "123a456a789",
    ToCSV = Csv.Document(Source, [Delimiter="a"])
in
    ToCSV

Returns:

image

And this query:

let
    Source = "789#(tab)456#(tab)123",
    ToCSV = Csv.Document(Source, [Delimiter="#(tab)"])
in
    ToCSV

Returns:

image

The Columns field

The Columns field specifies the number of columns in the table returned by Csv.Document, regardless of how many columns are actually present in the source data. For example, the following query:

let
    Source = "a,b,c",
    ToCSV = Csv.Document(Source, [Delimiter=",", Columns=3])
in
    ToCSV

…returns a table with three columns:

image

While the following query returns a table with four columns, even though only three columns are present in the data:

let
    Source = "a,b,c",
    ToCSV = Csv.Document(Source, [Delimiter=",", Columns=4])
in
    ToCSV

image

And the following query returns a table with only two columns, discarding the third column of data present in the data:

let
    Source = "a,b,c",
    ToCSV = Csv.Document(Source, [Delimiter=",", Columns=2])
in
    ToCSV

image

The Columns field is not explicitly set by the user when you first connect to a CSV file via the UI, but the UI infers its value from the number of columns it finds in the CSV file. This can be a problem if the number of columns in the source data changes in the future because, as shown above, when the Columns field is set the table returned always has a fixed number of columns. As a result if the number of columns in the data source increases in the future you will find columns on the right-hand side of the table are not returned; similarly if the number of columns decreases you’ll see unwanted empty columns. Several people such as Prathy Kamasani have blogged about this problem and it may be better to delete the Columns field from the record, or not set the Columns field in the first place, in order to avoid it. If you do not set the Columns field then Csv.Document returns a table with the number of columns that are present in the first row of your source data.

The QuoteStyle field

The QuoteStyle field can take two possible values of type QuoteStyle: QuoteStyle.None and QuoteStyle.Csv. Here’s what the built-in documentation has to say about the QuoteStyle type:

image

While the value for QuoteStyle is set automatically when you connect to a file, if you edit a step in the Query Editor that uses Csv.Document you can change this value in the UI in the Line Breaks dropdown shown here:

image

As the screenshot above suggests this field controls whether line breaks inside text values are respected. For both QuoteStyle.None and QuoteStyle.Csv, if you wrap a text value inside double quotes those double quotes are used to indicate the start and the end of the text value and are not shown in the output; if you want a double quote to appear, you have to double it up. However if QuoteStyle.None is set then line breaks are always respected, even if they appear inside double quotes; if QuoteStyle.Csv is set, then line breaks inside double quotes are ignored. Take the following CSV file for example:

image

The following M query, using QuoteStyle.None:

let
    Source = File.Contents("C:\CSVTests\SourceFileWithQuotes.csv"),
    ToCSV = Csv.Document(Source,[QuoteStyle=QuoteStyle.None])
in
    ToCSV

…returns the following table with two rows in it:

image

Whereas the following M query, using QuoteStyle.Csv:

let
    Source = File.Contents("C:\CSVTests\SourceFileWithQuotes.csv"),
    ToCSV = Csv.Document(Source,[QuoteStyle=QuoteStyle.Csv])
in
    ToCSV

…returns a table with just one row, and a line break present in the text value in the first column:

image

The CsvStyle field

The final field that can be used, CsvStyle, is also related to quotes. It can take one of two values of type CsvStyle: Csv.QuoteAfterDelimiter and CsvStyle.QuoteAlways.

image

If the CsvStyle field is not set, the default is CsvStyle.QuoteAlways. Consider the following CSV file:

image

Notice that on the second line there is a space after the comma. The following M query:

let
    Source = 
	File.Contents("C:\CSVTests\SourceFileWithQuotes.csv"),
    ToCSV = 
	Csv.Document(
		Source,
		[CsvStyle=CsvStyle.QuoteAlways])
in
    ToCSV

Returns this, because the space before the comma is not treated as significant:

image

Whereas the following M query:

let
    Source = 
	File.Contents("C:\CSVTests\SourceFileWithQuotes.csv"),
    ToCSV = 
	Csv.Document(
		Source,
		[CsvStyle=CsvStyle.QuoteAfterDelimiter])
in
    ToCSV

Returns the text “four” in double quotes on the second line, because the space before the comma on the second line changes how the double quotes are treated:

image

 

Using a list or a table type in the second parameter

If the first line of your CSV file contains column headers and you connect to the file using the Query Editor user interface, in most cases this will be detected and an extra step will be added to your query that uses Table.PromoteHeaders to use these values as the column headers. However if you don’t have column headers inside your CSV file, instead of a record it is also possible to supply a list of column names or even better a table type to define the columns present in your CSV file. When you do this Csv.Document has three other parameters that can be used to do some of the same things that are possible if you use a record in the second parameter – Delimiter, ExtraValues and Encoding – and they are described below.

For example, in the following CSV file there are three columns: Month, Product and Sales.

image

Using a list of text values containing these column names in the second parameter of Csv.Document, as in the following M query:

let
    Source = File.Contents("C:\CSVTests\SourceFile.csv"),
    ToCSV = Csv.Document(Source,{"Month","Product","Sales"})
in
    ToCSV

Returns the following table:

image

This has set the column names correctly but the data types of the three columns are set to text. What if I know that only the Month and Product columns contain text and the Sales column should be a number? Instead of a list of column names, using a table type allows you to set names and data types for each column:

let
    Source = File.Contents("C:\CSVTests\SourceFile.csv"),
    ToCSV = Csv.Document(
	Source,
	type table 
		[#"Month"=text, #"Product"=text, #"Sales"=number])
in
    ToCSV

image

Notice how now the Sales column has its data type set to number.

The Delimiter parameter

If you have used a list of column names or a table type in the second parameter of Csv.Document, you can use the third parameter to control how each row of data is split up into columns. There are two ways you can do this.

First of all, you can pass any piece of text to the third parameter to specify a delimiter. Unlike the delimiter field of the second parameter described above, this can be a single character or multiple characters. For example, the following M query:

let
    Source = "abcdefg",
    ToCSV = Csv.Document(Source,{"first","second"},"c")
in
    ToCSV

Returns:

image

And the following M query:

let
    Source = "abcdefg",
    ToCSV = Csv.Document(Source,{"first","second"},"cd")
in
    ToCSV

Returns:

image

Instead of text, the Delimiter parameter can also take a list of integer values to allow you to handle fixed-width files. This functionality is available from the UI when you choose the Fixed Width option from the Delimiter dropdown box when you connect to a CSV file for the first time:

image

Each integer in the list represents the number of characters from the start of the row that marks the start of each column; as a result, each integer in the list has to be a larger than the preceding integer. The values are 0-based so 0 marks the start of a row. For example, the M query:

let
    Source = "abcdefg",
    ToCSV = Csv.Document(Source,{"first","second","third"},{0,3,5})
in
    ToCSV

Returns:

image

 

The ExtraValues parameter

The ExtraValues parameter allows you to handle scenarios where there are extra columns on the end of lines. This isn’t quite as useful as it sounds though: most of the time when the number of columns varies in a CSV file it’s because there are unquoted line breaks in text columns, in which case you should make sure your source data always wraps text in double quotes and use the QuoteStyle option described above, or if you can’t fix your data source, see this post.

The ExtraValues parameter can take one of three values of type ExtraValues: ExtraValues.List, ExtraValues.Ignore and ExtraValues.Error.

image

Consider the following CSV file with two extra columns on the second row:

image

The following query reads data from this file:

let
    Source = File.Contents("C:\CSVTests\SourceFile.csv"),
    ToCSV = Csv.Document(Source,{"Month","Product","Sales"})
in
    ToCSV

As you can see from the screenshot below, because we have specified that there are three columns in the table, the error “There were more columns in the result than expected” is returned for each cell on the second line:

image

The same thing happens when ExtraValues.Error is explicitly specified in the fourth parameter, like so:

let
    Source = File.Contents("C:\CSVTests\SourceFile.csv"),
    ToCSV = 
	Csv.Document(
		Source,
		{"Month","Product","Sales"},
		",",
		ExtraValues.Error
	)
in
    ToCSV

If you set ExtraValues.Ignore instead, though:

let
    Source = File.Contents("C:\CSVTests\SourceFile.csv"),
    ToCSV = 
	Csv.Document(
		Source,
		{"Month","Product","Sales"},
		",",
		ExtraValues.Ignore
	)
in
    ToCSV

The extra columns are ignored and no errors are returned:

image

Setting ExtraValues.List allows you to capture any extra column values in a list; however, if you want to do this you will need to add an extra column to your table to hold these values. For example, notice in this query that four columns rather than three have been defined:

let
    Source = File.Contents("C:\CSVTests\SourceFile.csv"),
    ToCSV = 
	Csv.Document(
		Source,
		{"Month","Product","Sales","Extra Columns"},
		",",
		ExtraValues.List)
in
    ToCSV

The output looks like this:

image

On the first and third rows the Extra Columns column contains an empty list. On the second row, however, the Extra Columns column contains a list containing two values – the two values from the two extra columns on that line.

The Encoding parameter

The Encoding parameter corresponds directly to the Encoding field used when you pass a record to the second parameter, as described above. The one difference is that it can take an integer or a value of type TextEncoding, although the TextEncoding data type only contains values for some of the more common code pages so the only reason to use it would be for readability:

image

As a result, the following two M queries:

let
    Source = File.Contents("C:\CSVTests\SourceFile.csv"),
    ToCSV = Csv.Document(
	Source, 
	{"Month","Product","Sales"},
	",",
	ExtraValues.Ignore,
	TextEncoding.Windows
	)
in
    ToCSV
let
    Source = File.Contents("C:\CSVTests\SourceFile.csv"),
    ToCSV = Csv.Document(
	Source, 
	{"Month","Product","Sales"},
	",",
	ExtraValues.Ignore,
	1252
	)
in
    ToCSV

…return the same result.

What about CsvStyle and QuoteStyle?

If you specify a list of column names or a table type in the second parameter of Csv.Document there’s no way to set CsvStyle or QuoteStyle – these options are only available when you use a record in the second parameter. The behaviour you get is the same as CsvStyle.QuoteAlways and QuoteStyle.Csv, so with the following source data:

image

This M query:

let
    Source = File.Contents("C:\CSVTests\SourceFileWithQuotes.csv"),
    ToCSV = Csv.Document(
		Source, 
		{"Month","Sales"},
		",",
		ExtraValues.Ignore,
		1252)
in
    ToCSV

returns:

image

%d bloggers like this: