如何将具有相同名称和模式但不同目录的文本文件导入数据库?

时间:2021-12-29 16:16:45

I require to import multiple txt files with the same name and same schemas into the same table in SQL Server 2008 database. The problem that I have is that they are all in different directories:

我需要将具有相同名称和相同模式的多个txt文件导入SQL Server 2008数据库中的同一个表中。我遇到的问题是它们都在不同的目录中:

TEST
     201304
            sample1.txt
            sample2.txt
     201305
            sample1.txt
            sample2.txt
     201306
            sample1.txt
            sample2.txt

Is there any way in SSIS that I can set this up?

在SSIS中我有什么方法可以设置它吗?

1 个解决方案

#1


16  

Yes. You will want to use a Foreach File Container and then check the Traverse Subfolder option.

是。您将需要使用Foreach文件容器,然后检查Traverse子文件夹选项。

Edit

Apparently my answer wasn't cromulent enough, so please accept this working code which illustrates what my brief original answer stated.

显然我的回答并不充分,所以请接受这个工作代码,说明我简短的原始答案。

Source data

I created 3 folders as described above to contain files sample1.txt and sample2.txt

我创建了3个文件夹,如上所述,包含文件sample1.txt和sample2.txt

C:\>MKDIR SSISDATA\SO\TEST\201304
C:\>MKDIR SSISDATA\SO\TEST\201305
C:\>MKDIR SSISDATA\SO\TEST\201306

The contents of the file are below. Each version of the file in each folder has the ID value incremented along with the text values altered to prove it has picked up the new file.

该文件的内容如下。每个文件夹中的每个文件版本都会增加ID值以及更改的文本值,以证明它已经拾取了新文件。

ID,value
1,ABC

Package generation

This part assumes you have BIDS Helper installed. It is not required for the solution but simply provides a common framework future readers could use to reproduce this solution

本部分假设您已安装BIDS Helper。它不是解决方案所必需的,而是简单地提供了未来读者可以用来重现此解决方案的通用框架

I created a BIML file with the following content. Even though I have the table create step in there, I needed to have that run on the target server prior to generating the package.

我创建了一个包含以下内容的BIML文件。即使我有表创建步骤,我需要在生成包之前在目标服务器上运行。

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
    <!-- Create a basic flat file source definition -->
    <FileFormats>
        <FlatFileFormat
            Name="FFFSrc"
            CodePage="1252"
            RowDelimiter="CRLF"
            IsUnicode="false"
            FlatFileType="Delimited"
            ColumnNamesInFirstDataRow="true"
        >
            <Columns>
                <Column
                    Name="ID"
                    DataType="Int32"
                    Delimiter=","
                    ColumnType="Delimited"
                />
                <Column
                    Name="value"
                    DataType="AnsiString"
                    Delimiter="CRLF"
                    InputLength="20"
                    MaximumWidth="20"
                    Length="20"
                    CodePage="1252"
                    ColumnType="Delimited"
                    />
            </Columns>
        </FlatFileFormat>
    </FileFormats>

    <!-- Create a connection that uses the flat file format defined above-->
    <Connections>
        <FlatFileConnection
            Name="FFSrc"
            FileFormat="FFFSrc"
            FilePath="C:\ssisdata\so\TEST\201306\sample1.txt"
            DelayValidation="true"
        />
        <OleDbConnection
            Name="tempdb"
            ConnectionString="Data Source=localhost\dev2012;Initial Catalog=tempdb;Provider=SQLNCLI11.1;Integrated Security=SSPI;Auto Translate=False;"
        />

    </Connections>

    <!-- Create a package to illustrate how to apply an expression on the Connection Manager -->
    <Packages>
        <Package
            Name="so_19957451"
            ConstraintMode="Linear"
        >
            <Connections>
                <Connection ConnectionName="tempdb"/>
                <Connection ConnectionName="FFSrc">
                    <Expressions>
                        <!-- Assign a variable to the ConnectionString property. 
                        The syntax for this is ConnectionManagerName.Property -->
                        <Expression PropertyName="FFSrc.ConnectionString">@[User::CurrentFileName]</Expression>
                    </Expressions>
                </Connection>
            </Connections>

            <!-- Create a single variable that points to the current file -->
            <Variables>
                <Variable Name="CurrentFileName" DataType="String">C:\ssisdata\so\TEST\201306\sample1.txt</Variable>
                <Variable Name="FileMask" DataType="String">*.txt</Variable>
                <Variable Name="SourceFolder" DataType="String">C:\ssisdata\so\TEST</Variable>
                <Variable Name="RowCountInput" DataType="Int32">0</Variable>
                <Variable Name="TargetTable" DataType="String">[dbo].[so_19957451]</Variable>
            </Variables>

            <!-- Add a foreach file enumerator. Use the above -->
            <Tasks>
                <ExecuteSQL 
                    Name="SQL Create Table"
                    ConnectionName="tempdb">
                    <DirectInput>
                        IF NOT EXISTS (SELECT * FROM sys.tables T WHERE T.name = 'so_19957451' and T.schema_id = schema_id('dbo'))
                        BEGIN
                            CREATE TABLE dbo.so_19957451(ID int NOT NULL, value varchar(20) NOT NULL);
                        END
                    </DirectInput>
                </ExecuteSQL>
                <ForEachFileLoop
                    Name="FELC Consume files"
                    FileSpecification="*.csv"
                    ProcessSubfolders="true"
                    RetrieveFileNameFormat="FullyQualified"
                    Folder="C:\"
                    ConstraintMode="Linear"
                >
                    <!-- Define the expressions to make the input folder and the file mask 
                    driven by variable values -->
                    <Expressions>
                        <Expression PropertyName="Directory">@[User::SourceFolder]</Expression>
                        <Expression PropertyName="FileSpec">@[User::FileMask]</Expression>
                    </Expressions>
                    <VariableMappings>
                        <!-- Notice that we use the convention of User.Variable name here -->
                        <VariableMapping
                            Name="0"
                            VariableName="User.CurrentFileName"
                        />
                    </VariableMappings>
                    <Tasks>
                        <Dataflow Name="DFT Import file" DelayValidation="true">
                            <Transformations>
                                <FlatFileSource Name="FFS Sample" ConnectionName="FFSrc"/>
                                <RowCount Name="RC Source" VariableName="User.RowCountInput"/>
                                <OleDbDestination 
                                    Name="OLE_DST"
                                    ConnectionName="tempdb">
                                    <TableFromVariableOutput VariableName="User.TargetTable"/>                                  
                                </OleDbDestination>
                            </Transformations>
                        </Dataflow>
                    </Tasks>
                </ForEachFileLoop>
            </Tasks>
        </Package>
    </Packages>
</Biml>

Right click on the biml file and select Generate SSIS Package. At this point, you should have a package named so_19957451 added to your current SSIS project.

右键单击biml文件,然后选择Generate SSIS Package。此时,您应该将一个名为so_19957451的包添加到当前的SSIS项目中。

Package configuration

There's no need for any configuration because it's already been done via BIML but moar screenshots make for better answers.

没有任何配置,因为它已经通过BIML完成,但是moar截图可以提供更好的答案。

This is the basic package

这是基本包

如何将具有相同名称和模式但不同目录的文本文件导入数据库?

Here are my variables

这是我的变量

如何将具有相同名称和模式但不同目录的文本文件导入数据库?

Configuration of the Foreach Loop, as called out in the MSDN article as well as my note of select the Traverse subfolder

Foreach循环的配置,如MSDN文章中所述,以及我选择Traverse子文件夹的注释

如何将具有相同名称和模式但不同目录的文本文件导入数据库?

Assign the value generated per loop to the variable Current

将每个循环生成的值分配给变量Current

如何将具有相同名称和模式但不同目录的文本文件导入数据库?

The flat file source has an expression applied to the ConnectionString property to ensure it uses the Variable @User::CurrentFileName. This changes the source per execution of the loop.

平面文件源具有应用于ConnectionString属性的表达式,以确保它使用Variable @User :: CurrentFileName。这会更改每次执行循环的源。

如何将具有相同名称和模式但不同目录的文本文件导入数据库?

Execution results

Results from the database

数据库的结果

如何将具有相同名称和模式但不同目录的文本文件导入数据库?

Match the output from the package execution

匹配包执行的输出

Information: 0x402090DC at DFT Import file, FFS Sample [2]: The processing of file "C:\ssisdata\so\TEST\201304\sample1.txt" has started.

信息:DFT导入文件中的0x402090DC,FFS示例[2]:已启动文件“C:\ ssisdata \ so \ TEST \ 201304 \ sample1.txt”的处理。

Information: 0x402090DD at DFT Import file, FFS Sample [2]: The processing of file "C:\ssisdata\so\TEST\201304\sample1.txt" has ended.

信息:DFT导入文件中的0x402090DD,FFS示例[2]:文件“C:\ ssisdata \ so \ TEST \ 201304 \ sample1.txt”的处理已结束。

Information: 0x402090DC at DFT Import file, FFS Sample [2]: The processing of file "C:\ssisdata\so\TEST\201304\sample2.txt" has started.

信息:DFT导入文件中的0x402090DC,FFS示例[2]:已启动文件“C:\ ssisdata \ so \ TEST \ 201304 \ sample2.txt”的处理。

Information: 0x402090DD at DFT Import file, FFS Sample [2]: The processing of file "C:\ssisdata\so\TEST\201304\sample2.txt" has ended.

信息:DFT导入文件中的0x402090DD,FFS示例[2]:文件“C:\ ssisdata \ so \ TEST \ 201304 \ sample2.txt”的处理已结束。

Information: 0x402090DC at DFT Import file, FFS Sample [2]: The processing of file "C:\ssisdata\so\TEST\201305\sample1.txt" has started.

信息:DFT导入文件中的0x402090DC,FFS示例[2]:已启动文件“C:\ ssisdata \ so \ TEST \ 201305 \ sample1.txt”的处理。

Information: 0x402090DD at DFT Import file, FFS Sample [2]: The processing of file "C:\ssisdata\so\TEST\201305\sample1.txt" has ended.

信息:DFT导入文件中的0x402090DD,FFS示例[2]:文件“C:\ ssisdata \ so \ TEST \ 201305 \ sample1.txt”的处理已结束。

Information: 0x402090DC at DFT Import file, FFS Sample [2]: The processing of file "C:\ssisdata\so\TEST\201305\sample2.txt" has started.

信息:DFT导入文件中的0x402090DC,FFS示例[2]:已启动文件“C:\ ssisdata \ so \ TEST \ 201305 \ sample2.txt”的处理。

Information: 0x402090DD at DFT Import file, FFS Sample [2]: The processing of file "C:\ssisdata\so\TEST\201305\sample2.txt" has ended.

信息:DFT导入文件中的0x402090DD,FFS示例[2]:文件“C:\ ssisdata \ so \ TEST \ 201305 \ sample2.txt”的处理已结束。

Information: 0x402090DC at DFT Import file, FFS Sample [2]: The processing of file "C:\ssisdata\so\TEST\201306\sample1.txt" has started.

信息:DFT导入文件中的0x402090DC,FFS示例[2]:已启动文件“C:\ ssisdata \ so \ TEST \ 201306 \ sample1.txt”的处理。

Information: 0x402090DD at DFT Import file, FFS Sample [2]: The processing of file "C:\ssisdata\so\TEST\201306\sample1.txt" has ended.

信息:DFT导入文件中的0x402090DD,FFS示例[2]:文件“C:\ ssisdata \ so \ TEST \ 201306 \ sample1.txt”的处理已结束。

Information: 0x402090DC at DFT Import file, FFS Sample [2]: The processing of file "C:\ssisdata\so\TEST\201306\sample2.txt" has started.

信息:DFT导入文件中的0x402090DC,FFS示例[2]:已启动文件“C:\ ssisdata \ so \ TEST \ 201306 \ sample2.txt”的处理。

Information: 0x402090DD at DFT Import file, FFS Sample [2]: The processing of file "C:\ssisdata\so\TEST\201306\sample2.txt" has ended.

信息:DFT导入文件中的0x402090DD,FFS示例[2]:文件“C:\ ssisdata \ so \ TEST \ 201306 \ sample2.txt”的处理已结束。

#1


16  

Yes. You will want to use a Foreach File Container and then check the Traverse Subfolder option.

是。您将需要使用Foreach文件容器,然后检查Traverse子文件夹选项。

Edit

Apparently my answer wasn't cromulent enough, so please accept this working code which illustrates what my brief original answer stated.

显然我的回答并不充分,所以请接受这个工作代码,说明我简短的原始答案。

Source data

I created 3 folders as described above to contain files sample1.txt and sample2.txt

我创建了3个文件夹,如上所述,包含文件sample1.txt和sample2.txt

C:\>MKDIR SSISDATA\SO\TEST\201304
C:\>MKDIR SSISDATA\SO\TEST\201305
C:\>MKDIR SSISDATA\SO\TEST\201306

The contents of the file are below. Each version of the file in each folder has the ID value incremented along with the text values altered to prove it has picked up the new file.

该文件的内容如下。每个文件夹中的每个文件版本都会增加ID值以及更改的文本值,以证明它已经拾取了新文件。

ID,value
1,ABC

Package generation

This part assumes you have BIDS Helper installed. It is not required for the solution but simply provides a common framework future readers could use to reproduce this solution

本部分假设您已安装BIDS Helper。它不是解决方案所必需的,而是简单地提供了未来读者可以用来重现此解决方案的通用框架

I created a BIML file with the following content. Even though I have the table create step in there, I needed to have that run on the target server prior to generating the package.

我创建了一个包含以下内容的BIML文件。即使我有表创建步骤,我需要在生成包之前在目标服务器上运行。

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
    <!-- Create a basic flat file source definition -->
    <FileFormats>
        <FlatFileFormat
            Name="FFFSrc"
            CodePage="1252"
            RowDelimiter="CRLF"
            IsUnicode="false"
            FlatFileType="Delimited"
            ColumnNamesInFirstDataRow="true"
        >
            <Columns>
                <Column
                    Name="ID"
                    DataType="Int32"
                    Delimiter=","
                    ColumnType="Delimited"
                />
                <Column
                    Name="value"
                    DataType="AnsiString"
                    Delimiter="CRLF"
                    InputLength="20"
                    MaximumWidth="20"
                    Length="20"
                    CodePage="1252"
                    ColumnType="Delimited"
                    />
            </Columns>
        </FlatFileFormat>
    </FileFormats>

    <!-- Create a connection that uses the flat file format defined above-->
    <Connections>
        <FlatFileConnection
            Name="FFSrc"
            FileFormat="FFFSrc"
            FilePath="C:\ssisdata\so\TEST\201306\sample1.txt"
            DelayValidation="true"
        />
        <OleDbConnection
            Name="tempdb"
            ConnectionString="Data Source=localhost\dev2012;Initial Catalog=tempdb;Provider=SQLNCLI11.1;Integrated Security=SSPI;Auto Translate=False;"
        />

    </Connections>

    <!-- Create a package to illustrate how to apply an expression on the Connection Manager -->
    <Packages>
        <Package
            Name="so_19957451"
            ConstraintMode="Linear"
        >
            <Connections>
                <Connection ConnectionName="tempdb"/>
                <Connection ConnectionName="FFSrc">
                    <Expressions>
                        <!-- Assign a variable to the ConnectionString property. 
                        The syntax for this is ConnectionManagerName.Property -->
                        <Expression PropertyName="FFSrc.ConnectionString">@[User::CurrentFileName]</Expression>
                    </Expressions>
                </Connection>
            </Connections>

            <!-- Create a single variable that points to the current file -->
            <Variables>
                <Variable Name="CurrentFileName" DataType="String">C:\ssisdata\so\TEST\201306\sample1.txt</Variable>
                <Variable Name="FileMask" DataType="String">*.txt</Variable>
                <Variable Name="SourceFolder" DataType="String">C:\ssisdata\so\TEST</Variable>
                <Variable Name="RowCountInput" DataType="Int32">0</Variable>
                <Variable Name="TargetTable" DataType="String">[dbo].[so_19957451]</Variable>
            </Variables>

            <!-- Add a foreach file enumerator. Use the above -->
            <Tasks>
                <ExecuteSQL 
                    Name="SQL Create Table"
                    ConnectionName="tempdb">
                    <DirectInput>
                        IF NOT EXISTS (SELECT * FROM sys.tables T WHERE T.name = 'so_19957451' and T.schema_id = schema_id('dbo'))
                        BEGIN
                            CREATE TABLE dbo.so_19957451(ID int NOT NULL, value varchar(20) NOT NULL);
                        END
                    </DirectInput>
                </ExecuteSQL>
                <ForEachFileLoop
                    Name="FELC Consume files"
                    FileSpecification="*.csv"
                    ProcessSubfolders="true"
                    RetrieveFileNameFormat="FullyQualified"
                    Folder="C:\"
                    ConstraintMode="Linear"
                >
                    <!-- Define the expressions to make the input folder and the file mask 
                    driven by variable values -->
                    <Expressions>
                        <Expression PropertyName="Directory">@[User::SourceFolder]</Expression>
                        <Expression PropertyName="FileSpec">@[User::FileMask]</Expression>
                    </Expressions>
                    <VariableMappings>
                        <!-- Notice that we use the convention of User.Variable name here -->
                        <VariableMapping
                            Name="0"
                            VariableName="User.CurrentFileName"
                        />
                    </VariableMappings>
                    <Tasks>
                        <Dataflow Name="DFT Import file" DelayValidation="true">
                            <Transformations>
                                <FlatFileSource Name="FFS Sample" ConnectionName="FFSrc"/>
                                <RowCount Name="RC Source" VariableName="User.RowCountInput"/>
                                <OleDbDestination 
                                    Name="OLE_DST"
                                    ConnectionName="tempdb">
                                    <TableFromVariableOutput VariableName="User.TargetTable"/>                                  
                                </OleDbDestination>
                            </Transformations>
                        </Dataflow>
                    </Tasks>
                </ForEachFileLoop>
            </Tasks>
        </Package>
    </Packages>
</Biml>

Right click on the biml file and select Generate SSIS Package. At this point, you should have a package named so_19957451 added to your current SSIS project.

右键单击biml文件,然后选择Generate SSIS Package。此时,您应该将一个名为so_19957451的包添加到当前的SSIS项目中。

Package configuration

There's no need for any configuration because it's already been done via BIML but moar screenshots make for better answers.

没有任何配置,因为它已经通过BIML完成,但是moar截图可以提供更好的答案。

This is the basic package

这是基本包

如何将具有相同名称和模式但不同目录的文本文件导入数据库?

Here are my variables

这是我的变量

如何将具有相同名称和模式但不同目录的文本文件导入数据库?

Configuration of the Foreach Loop, as called out in the MSDN article as well as my note of select the Traverse subfolder

Foreach循环的配置,如MSDN文章中所述,以及我选择Traverse子文件夹的注释

如何将具有相同名称和模式但不同目录的文本文件导入数据库?

Assign the value generated per loop to the variable Current

将每个循环生成的值分配给变量Current

如何将具有相同名称和模式但不同目录的文本文件导入数据库?

The flat file source has an expression applied to the ConnectionString property to ensure it uses the Variable @User::CurrentFileName. This changes the source per execution of the loop.

平面文件源具有应用于ConnectionString属性的表达式,以确保它使用Variable @User :: CurrentFileName。这会更改每次执行循环的源。

如何将具有相同名称和模式但不同目录的文本文件导入数据库?

Execution results

Results from the database

数据库的结果

如何将具有相同名称和模式但不同目录的文本文件导入数据库?

Match the output from the package execution

匹配包执行的输出

Information: 0x402090DC at DFT Import file, FFS Sample [2]: The processing of file "C:\ssisdata\so\TEST\201304\sample1.txt" has started.

信息:DFT导入文件中的0x402090DC,FFS示例[2]:已启动文件“C:\ ssisdata \ so \ TEST \ 201304 \ sample1.txt”的处理。

Information: 0x402090DD at DFT Import file, FFS Sample [2]: The processing of file "C:\ssisdata\so\TEST\201304\sample1.txt" has ended.

信息:DFT导入文件中的0x402090DD,FFS示例[2]:文件“C:\ ssisdata \ so \ TEST \ 201304 \ sample1.txt”的处理已结束。

Information: 0x402090DC at DFT Import file, FFS Sample [2]: The processing of file "C:\ssisdata\so\TEST\201304\sample2.txt" has started.

信息:DFT导入文件中的0x402090DC,FFS示例[2]:已启动文件“C:\ ssisdata \ so \ TEST \ 201304 \ sample2.txt”的处理。

Information: 0x402090DD at DFT Import file, FFS Sample [2]: The processing of file "C:\ssisdata\so\TEST\201304\sample2.txt" has ended.

信息:DFT导入文件中的0x402090DD,FFS示例[2]:文件“C:\ ssisdata \ so \ TEST \ 201304 \ sample2.txt”的处理已结束。

Information: 0x402090DC at DFT Import file, FFS Sample [2]: The processing of file "C:\ssisdata\so\TEST\201305\sample1.txt" has started.

信息:DFT导入文件中的0x402090DC,FFS示例[2]:已启动文件“C:\ ssisdata \ so \ TEST \ 201305 \ sample1.txt”的处理。

Information: 0x402090DD at DFT Import file, FFS Sample [2]: The processing of file "C:\ssisdata\so\TEST\201305\sample1.txt" has ended.

信息:DFT导入文件中的0x402090DD,FFS示例[2]:文件“C:\ ssisdata \ so \ TEST \ 201305 \ sample1.txt”的处理已结束。

Information: 0x402090DC at DFT Import file, FFS Sample [2]: The processing of file "C:\ssisdata\so\TEST\201305\sample2.txt" has started.

信息:DFT导入文件中的0x402090DC,FFS示例[2]:已启动文件“C:\ ssisdata \ so \ TEST \ 201305 \ sample2.txt”的处理。

Information: 0x402090DD at DFT Import file, FFS Sample [2]: The processing of file "C:\ssisdata\so\TEST\201305\sample2.txt" has ended.

信息:DFT导入文件中的0x402090DD,FFS示例[2]:文件“C:\ ssisdata \ so \ TEST \ 201305 \ sample2.txt”的处理已结束。

Information: 0x402090DC at DFT Import file, FFS Sample [2]: The processing of file "C:\ssisdata\so\TEST\201306\sample1.txt" has started.

信息:DFT导入文件中的0x402090DC,FFS示例[2]:已启动文件“C:\ ssisdata \ so \ TEST \ 201306 \ sample1.txt”的处理。

Information: 0x402090DD at DFT Import file, FFS Sample [2]: The processing of file "C:\ssisdata\so\TEST\201306\sample1.txt" has ended.

信息:DFT导入文件中的0x402090DD,FFS示例[2]:文件“C:\ ssisdata \ so \ TEST \ 201306 \ sample1.txt”的处理已结束。

Information: 0x402090DC at DFT Import file, FFS Sample [2]: The processing of file "C:\ssisdata\so\TEST\201306\sample2.txt" has started.

信息:DFT导入文件中的0x402090DC,FFS示例[2]:已启动文件“C:\ ssisdata \ so \ TEST \ 201306 \ sample2.txt”的处理。

Information: 0x402090DD at DFT Import file, FFS Sample [2]: The processing of file "C:\ssisdata\so\TEST\201306\sample2.txt" has ended.

信息:DFT导入文件中的0x402090DD,FFS示例[2]:文件“C:\ ssisdata \ so \ TEST \ 201306 \ sample2.txt”的处理已结束。