使用格式文件批量插入SQL中的csv,以删除双引号

时间:2022-09-15 13:32:45

I have a CSV file in the following format:

我有以下格式的CSV文件:

data, data, "timestamp", data, data, data, data, data

I need to remove the double quotes from around the timestamp data, then insert it into the table as a DATETIME data type.

我需要从timestamp数据中删除双引号,然后将其作为DATETIME数据类型插入到表中。

After researching formatfiles, I have come up with this:

在研究了格式文件之后,我想到了以下几点:

10.0
8
1   SQLCHAR 0   12  ","     1   Data        SQL_Latin1_General_CP1_CI_AS
2   SQLCHAR 0   12  ","     2   Data        SQL_Latin1_General_CP1_CI_AS
3   SQLCHAR 0   26  "",""   3   Timestamp   SQL_Latin1_General_CP1_CI_AS
4   SQLCHAR 0   41  ","     4   Data        SQL_Latin1_General_CP1_CI_AS
5   SQLCHAR 0   41  ","     5   Data        SQL_Latin1_General_CP1_CI_AS
6   SQLCHAR 0   41  ","     6   Data        SQL_Latin1_General_CP1_CI_AS
7   SQLCHAR 0   5   ","     7   Data        SQL_Latin1_General_CP1_CI_AS
8   SQLCHAR 0   12  "0x0a"  6   Data        SQL_Latin1_General_CP1_CI_AS

where the 3rd row, Timestamp, is the item with the double quotes around it.

第3行,时间戳,是带有双引号的项。

Attempting to use this file in a Bulk Insert results in the error message

尝试在批量插入中使用此文件会导致错误消息

Msg 4823, Level 16, State 1, Line 2 Cannot bulk load. Invalid column number in the format file.

Msg 4823, 16级,状态1,2号线无法装载。格式文件中的列号无效。

Is there a way I can alter the formatfile to do what I need? I'm using MSSQL.

是否有一种方法可以改变格式文件来执行我需要的操作?我使用该软件。

2 个解决方案

#1


5  

Well your invalid column number error is probably caused by having column number 6 repeated instead of column number 8 in your destination field numbers.

您的无效列数错误很可能是由于您的目标字段号中的第8列而不是第8列而造成的。

But for removing the "" you need to use \" in your delimiter for column 2 and 3 like so...

但是要删除列2和列3中的“您需要使用\”,就像这样……

SQLCHAR 0   12  ","     1   Data        SQL_Latin1_General_CP1_CI_AS
SQLCHAR 0   12  ",\""   2   Data        SQL_Latin1_General_CP1_CI_AS
SQLCHAR 0   26  "\","   3   Timestamp   SQL_Latin1_General_CP1_CI_AS
SQLCHAR 0   41  ","     4   Data        SQL_Latin1_General_CP1_CI_AS
SQLCHAR 0   41  ","     5   Data        SQL_Latin1_General_CP1_CI_AS
SQLCHAR 0   41  ","     6   Data        SQL_Latin1_General_CP1_CI_AS
SQLCHAR 0   5   ","     7   Data        SQL_Latin1_General_CP1_CI_AS
SQLCHAR 0   12  "\r\n"  6   Data        SQL_Latin1_General_CP1_CI_AS

-- note: use \r\n for row terminator for an Excel file saved as CSV

so for the delimter for column 2 is actually ," and your delimiter for your column 3 (timestamp) is ", - the " gets removed from the data as it's part of the delimiter.

因此,对于列2的delimter,实际上是,“您的列3的分隔符(时间戳)是”,“从数据中删除,因为它是分隔符的一部分。”

Note: if you have column titles in your first row this will not work correctly i.e. if your first row contains column headers like...

注意:如果您的第一行中有列标题,这将无法正常工作,例如,如果您的第一行包含列标题,如……

Field1Name,Field2Name,Timestamp,Field3Name ...

then the delimiters above will not work for this row because there are no quotes around the column header Timestamp. The result of this is that your first row will have correct data in columns 1 and 2 but then column 3 doesn't have a valid delimiter in row one (",) so it contains all the rest of the column headers and the first 3 fields of row 2 until it finally finds a correct delimiter (",) at the end column 3 on row 2. Then the rest of row 2 appears in the columns after. It's a mess. And you can't get round it by using

然后上面的分隔符对这一行无效,因为列头时间戳周围没有引号。的结果是,你的第一行会有正确的数据列1和2然后第三列在第一行没有一个有效的分隔符(“),它包含所有其他列标题和第一个3的第2行,直到它终于找到了正确的分隔符(“),最后3行2列。然后,第2行其余部分出现在后面的列中。这是一个烂摊子。你不能通过使用来绕过它

FIRSTROW = 2

You have to either remove the header row, or put quotes around your column 3 title -

你必须删除标题行,或者在你的第3栏标题周围加上引号

Field1Name,Field2Name,"Timestamp",Field3Name ...

or remove the quotes via your SQL once you have finished the bulk insert.

或者在完成批量插入之后,通过SQL删除引号。

#2


0  

would this work:

将这项工作:

SQLCHAR 0   12  ","     1   Data        SQL_Latin1_General_CP1_CI_AS
SQLCHAR 0   12  ","     2   Data        SQL_Latin1_General_CP1_CI_AS
SQLCHAR 0   26  '","'   3   Timestamp   SQL_Latin1_General_CP1_CI_AS
SQLCHAR 0   41  ","     4   Data        SQL_Latin1_General_CP1_CI_AS
SQLCHAR 0   41  ","     5   Data        SQL_Latin1_General_CP1_CI_AS
SQLCHAR 0   41  ","     6   Data        SQL_Latin1_General_CP1_CI_AS
SQLCHAR 0   5   ","     7   Data        SQL_Latin1_General_CP1_CI_AS
SQLCHAR 0   12  "0x0a"  6   Data        SQL_Latin1_General_CP1_CI_AS

#1


5  

Well your invalid column number error is probably caused by having column number 6 repeated instead of column number 8 in your destination field numbers.

您的无效列数错误很可能是由于您的目标字段号中的第8列而不是第8列而造成的。

But for removing the "" you need to use \" in your delimiter for column 2 and 3 like so...

但是要删除列2和列3中的“您需要使用\”,就像这样……

SQLCHAR 0   12  ","     1   Data        SQL_Latin1_General_CP1_CI_AS
SQLCHAR 0   12  ",\""   2   Data        SQL_Latin1_General_CP1_CI_AS
SQLCHAR 0   26  "\","   3   Timestamp   SQL_Latin1_General_CP1_CI_AS
SQLCHAR 0   41  ","     4   Data        SQL_Latin1_General_CP1_CI_AS
SQLCHAR 0   41  ","     5   Data        SQL_Latin1_General_CP1_CI_AS
SQLCHAR 0   41  ","     6   Data        SQL_Latin1_General_CP1_CI_AS
SQLCHAR 0   5   ","     7   Data        SQL_Latin1_General_CP1_CI_AS
SQLCHAR 0   12  "\r\n"  6   Data        SQL_Latin1_General_CP1_CI_AS

-- note: use \r\n for row terminator for an Excel file saved as CSV

so for the delimter for column 2 is actually ," and your delimiter for your column 3 (timestamp) is ", - the " gets removed from the data as it's part of the delimiter.

因此,对于列2的delimter,实际上是,“您的列3的分隔符(时间戳)是”,“从数据中删除,因为它是分隔符的一部分。”

Note: if you have column titles in your first row this will not work correctly i.e. if your first row contains column headers like...

注意:如果您的第一行中有列标题,这将无法正常工作,例如,如果您的第一行包含列标题,如……

Field1Name,Field2Name,Timestamp,Field3Name ...

then the delimiters above will not work for this row because there are no quotes around the column header Timestamp. The result of this is that your first row will have correct data in columns 1 and 2 but then column 3 doesn't have a valid delimiter in row one (",) so it contains all the rest of the column headers and the first 3 fields of row 2 until it finally finds a correct delimiter (",) at the end column 3 on row 2. Then the rest of row 2 appears in the columns after. It's a mess. And you can't get round it by using

然后上面的分隔符对这一行无效,因为列头时间戳周围没有引号。的结果是,你的第一行会有正确的数据列1和2然后第三列在第一行没有一个有效的分隔符(“),它包含所有其他列标题和第一个3的第2行,直到它终于找到了正确的分隔符(“),最后3行2列。然后,第2行其余部分出现在后面的列中。这是一个烂摊子。你不能通过使用来绕过它

FIRSTROW = 2

You have to either remove the header row, or put quotes around your column 3 title -

你必须删除标题行,或者在你的第3栏标题周围加上引号

Field1Name,Field2Name,"Timestamp",Field3Name ...

or remove the quotes via your SQL once you have finished the bulk insert.

或者在完成批量插入之后,通过SQL删除引号。

#2


0  

would this work:

将这项工作:

SQLCHAR 0   12  ","     1   Data        SQL_Latin1_General_CP1_CI_AS
SQLCHAR 0   12  ","     2   Data        SQL_Latin1_General_CP1_CI_AS
SQLCHAR 0   26  '","'   3   Timestamp   SQL_Latin1_General_CP1_CI_AS
SQLCHAR 0   41  ","     4   Data        SQL_Latin1_General_CP1_CI_AS
SQLCHAR 0   41  ","     5   Data        SQL_Latin1_General_CP1_CI_AS
SQLCHAR 0   41  ","     6   Data        SQL_Latin1_General_CP1_CI_AS
SQLCHAR 0   5   ","     7   Data        SQL_Latin1_General_CP1_CI_AS
SQLCHAR 0   12  "0x0a"  6   Data        SQL_Latin1_General_CP1_CI_AS