如何使用SSIS导入变量记录长度CSV文件?

时间:2021-10-06 05:08:08

Has anyone been able to get a variable record length text file (CSV) into SQL Server via SSIS?

有没有人能够通过SSIS将可变记录长度的文本文件(CSV)导入SQL Server?

I have tried time and again to get a CSV file into a SQL Server table, using SSIS, where the input file has varying record lengths. For this question, the two different record lengths are 63 and 326 bytes. All record lengths will be imported into the same 326 byte width table.

我一次又一次地尝试使用SSIS将CSV文件放入SQL Server表中,其中输入文件具有不同的记录长度。对于这个问题,两个不同的记录长度是63和326字节。所有记录长度将导入到相同的326字节宽度表中。

There are over 1 million records to import.
I have no control of the creation of the import file.
I must use SSIS.
I have confirmed with MS that this has been reported as a bug. I have tried several workarounds. Most have been where I try to write custom code to intercept the record and I cant seem to get that to work as I want.

导入的记录超过100万条。我无法控制导入文件的创建。我必须使用SSIS。我已经向MS确认这已被报告为错误。我尝试了几种解决方法。大部分都是我尝试编写自定义代码来拦截记录的地方,我似乎无法按照我的意愿去工作。

5 个解决方案

#1


4  

I had a similar problem, and used custom code (Script Task), and a Script Component under the Data Flow tab.

我有类似的问题,并使用自定义代码(脚本任务)和数据流选项卡下的脚本组件。

I have a Flat File Source feeding into a Script Component. Inside there I use code to manipulate the incomming data and fix it up for the destination.

我有一个平面文件源输入脚本组件。在那里,我使用代码来操纵incomming数据并为目的地修复它。

My issue was the provider was using '000000' as no date available, and another coloumn had a padding/trim issue.

我的问题是提供商使用'000000'作为没有可用日期,另一个coloumn有填充/修剪问题。

#2


1  

You should have no problem importing this file. Just make sure when you create the Flat File connection manager, select Delimited format, then set SSIS column length to maximum file column length so it can accomodate any data.

您应该没有问题导入此文件。只需确保在创建Flat File连接管理器时,选择Delimited format,然后将SSIS列长度设置为最大文件列长度,以便它可以容纳任何数据。

It appears like you are using Fixed width format, which is not correct for CSV files (since you have variable length column), or maybe you've incorrectly set the column delimiter.

看起来您使用的是固定宽度格式,这对于CSV文件不正确(因为您有可变长度列),或者您可能错误地设置了列分隔符。

#3


1  

Same issue. In my case, the target CSV file has header & footer records with formats completely different than the body of the file; the header/footer are used to validate completeness of file processing (date/times, record counts, amount totals - "checksum" by any other name ...). This is a common format for files from "mainframe" environments, and though I haven't started on it yet, I expect to have to use scripting to strip off the header/footer, save the rest as a new file, process the new file, and then do the validation. Can't exactly expect MS to have that out-of-the box (but it sure would be nice, wouldn't it?).

同样的问题。就我而言,目标CSV文件的页眉和页脚记录的格式与文件正文完全不同;页眉/页脚用于验证文件处理的完整性(日期/时间,记录计数,总金额 - 任何其他名称的“校验和”......)。这是来自“大型机”环境的文件的常见格式,虽然我还没有开始,但我希望必须使用脚本来剥离页眉/页脚,将其余部分保存为新文件,处理新文件文件,然后进行验证。不能完全指望MS拥有开箱即用(但肯定会很好,不是吗?)。

#4


0  

Why can't you just import it as a test file and set the column delimeter to "," and the row delimeter to CRLF?

为什么不能将它作为测试文件导入并将列分隔符设置为“,”并将行分隔符设置为CRLF?

#5


0  

You can write a script task using C# to iterate through each line and pad it with the proper amount of commas to pad the data out. This assumes, of course, that all of the data aligns with the proper columns.

您可以使用C#编写脚本任务来遍历每一行,并使用适当数量的逗号填充它以填充数据。当然,这假设所有数据都与正确的列对齐。

I.e. as you read each record, you can "count" the number of commas. Then, just append X number of commas to the end of the record until it has the correct number of commas.

即当您阅读每条记录时,您可以“计算”逗号的数量。然后,只需将X个逗号附加到记录的末尾,直到它具有正确的逗号数。

Excel has an issue that causes this kind of file to be created when converting to CSV.

Excel有一个问题,导致转换为CSV时创建此类文件。

If you can do this "by hand" the best way to solve this is to open the file in Excel, create a column at the "end" of the record, and fill it all the way down with 1s or some other character.

如果您可以“手动”执行此操作,解决此问题的最佳方法是在Excel中打开文件,在记录的“结尾”创建一个列,并使用1或其他字符将其填满。

Nasty, but can be a quick solution.

讨厌,但可以是一个快速的解决方案。

If you don't have the ability to do this, you can do the same thing programmatically as described above.

如果您无法执行此操作,则可以按上述方式以编程方式执行相同操作。

#1


4  

I had a similar problem, and used custom code (Script Task), and a Script Component under the Data Flow tab.

我有类似的问题,并使用自定义代码(脚本任务)和数据流选项卡下的脚本组件。

I have a Flat File Source feeding into a Script Component. Inside there I use code to manipulate the incomming data and fix it up for the destination.

我有一个平面文件源输入脚本组件。在那里,我使用代码来操纵incomming数据并为目的地修复它。

My issue was the provider was using '000000' as no date available, and another coloumn had a padding/trim issue.

我的问题是提供商使用'000000'作为没有可用日期,另一个coloumn有填充/修剪问题。

#2


1  

You should have no problem importing this file. Just make sure when you create the Flat File connection manager, select Delimited format, then set SSIS column length to maximum file column length so it can accomodate any data.

您应该没有问题导入此文件。只需确保在创建Flat File连接管理器时,选择Delimited format,然后将SSIS列长度设置为最大文件列长度,以便它可以容纳任何数据。

It appears like you are using Fixed width format, which is not correct for CSV files (since you have variable length column), or maybe you've incorrectly set the column delimiter.

看起来您使用的是固定宽度格式,这对于CSV文件不正确(因为您有可变长度列),或者您可能错误地设置了列分隔符。

#3


1  

Same issue. In my case, the target CSV file has header & footer records with formats completely different than the body of the file; the header/footer are used to validate completeness of file processing (date/times, record counts, amount totals - "checksum" by any other name ...). This is a common format for files from "mainframe" environments, and though I haven't started on it yet, I expect to have to use scripting to strip off the header/footer, save the rest as a new file, process the new file, and then do the validation. Can't exactly expect MS to have that out-of-the box (but it sure would be nice, wouldn't it?).

同样的问题。就我而言,目标CSV文件的页眉和页脚记录的格式与文件正文完全不同;页眉/页脚用于验证文件处理的完整性(日期/时间,记录计数,总金额 - 任何其他名称的“校验和”......)。这是来自“大型机”环境的文件的常见格式,虽然我还没有开始,但我希望必须使用脚本来剥离页眉/页脚,将其余部分保存为新文件,处理新文件文件,然后进行验证。不能完全指望MS拥有开箱即用(但肯定会很好,不是吗?)。

#4


0  

Why can't you just import it as a test file and set the column delimeter to "," and the row delimeter to CRLF?

为什么不能将它作为测试文件导入并将列分隔符设置为“,”并将行分隔符设置为CRLF?

#5


0  

You can write a script task using C# to iterate through each line and pad it with the proper amount of commas to pad the data out. This assumes, of course, that all of the data aligns with the proper columns.

您可以使用C#编写脚本任务来遍历每一行,并使用适当数量的逗号填充它以填充数据。当然,这假设所有数据都与正确的列对齐。

I.e. as you read each record, you can "count" the number of commas. Then, just append X number of commas to the end of the record until it has the correct number of commas.

即当您阅读每条记录时,您可以“计算”逗号的数量。然后,只需将X个逗号附加到记录的末尾,直到它具有正确的逗号数。

Excel has an issue that causes this kind of file to be created when converting to CSV.

Excel有一个问题,导致转换为CSV时创建此类文件。

If you can do this "by hand" the best way to solve this is to open the file in Excel, create a column at the "end" of the record, and fill it all the way down with 1s or some other character.

如果您可以“手动”执行此操作,解决此问题的最佳方法是在Excel中打开文件,在记录的“结尾”创建一个列,并使用1或其他字符将其填满。

Nasty, but can be a quick solution.

讨厌,但可以是一个快速的解决方案。

If you don't have the ability to do this, you can do the same thing programmatically as described above.

如果您无法执行此操作,则可以按上述方式以编程方式执行相同操作。