在插入数据库之前,如何验证SSIS文件中的数据?

时间:2022-04-24 11:22:05

What I want to do is take data from a dbf file and insert it in a table. Which I've already done. Since there are many files, a For-Each Container is being used. However, before inserting it into a table, I want to look at the date fields and compare it to a date variable. If the dates match the variable, then move on to the step of the flow. But if any of the dates don't match the variable, then that file and its contents are discarded and the next file is looked at.

我想要做的是从dbf文件中获取数据并将其插入表中。我已经做过了。由于文件很多,因此正在使用For-Each Container。但是,在将其插入表格之前,我想查看日期字段并将其与日期变量进行比较。如果日期与变量匹配,则转到流程的步骤。但是,如果任何日期与变量不匹配,则丢弃该文件及其内容,并查看下一个文件。

How do I accomplish this in SSIS?

我如何在SSIS中实现这一目标?

1 个解决方案

#1


You're looking for the Conditional Split Component within your Data Flow Task.

您正在寻找数据流任务中的条件性拆分组件。

Assuming your source column is MyDate and you have an SSIS Variable called @[User::ReferenceDate] then you'd apply an expression like

假设您的源列是MyDate并且您有一个名为@ [User :: ReferenceDate]的SSIS变量,那么您将应用一个类似的表达式

[MyDate] == @[User::ReferenceDate]

That will evaluate to True when the dates match, false otherwise.

当日期匹配时,这将评估为True,否则为false。

In your Conditional Split, add a row into the component.

在条件性拆分中,在组件中添加一行。

  • OutputName: DatesMatched
  • Condition: [MyDate] == @[User::ReferenceDate]
  • 条件:[MyDate] == @ [User :: ReferenceDate]

  • Default output name: DatesUnmatched
  • 默认输出名称:DatesUnmatched

Now when you connect the output from this to your destination, it'll ask whether you want to route the data using the DatesMatched or DatesUnmatched path. Use the DatesMatched path.

现在,当您将输出连接到目标时,它会询问您是否要使用DatesMatched或DatesUnmatched路径来路由数据。使用DatesMatched路径。

As I re-read this, if any of the dates don't match the variable, then that file and its contents are discarded then you're looking at double processing the file. The first time to read it all in and validate it. The second time, optional, will actually load to the database.

当我重新阅读它时,如果任何日期与变量不匹配,那么该文件及其内容将被丢弃,那么您正在查看对文件进行双重处理。第一次阅读全部并验证它。第二次,可选,实际上将加载到数据库。

From your Conditional Split, add a RowCount to the DatesUnmatched path. Use a Variable of type Integer/Int32 named CountDatesUnmatched. In a perfect world, that will be zero when the validation of the file completes.

从条件拆分中,将RowCount添加到DatesUnmatched路径。使用名为CountDatesUnmatched的Integer / Int32类型的变量。在一个完美的世界中,当文件验证完成时,它将为零。

In the Precedent Constraint between the Validation Data Flow and the actual Import Data Flow, double click the connector line and change the evaluation criteria from Constraint to Expression and Constraint. Leave the value as Success and in the Expression use @[User::CountDatesUnmatched] == 0 That data flow will only light up if both conditions are true: parsing was successful and no rows were sent to the Row Count component.

在验证数据流和实际导入数据流之间的先验约束中,双击连接线并将评估条件从约束更改为表达式和约束。将值保留为Success,并在Expression中使用@ [User :: CountDatesUnmatched] == 0如果两个条件都为真,则数据流只会亮起:解析成功且没有行发送到行计数组件。

Finally, you can cheat and sometimes this approach makes sense. If you're using an OLE DB Destination, then you can use the MaximumInsertCommitSize of the default 2B and a data access mode of fast load. This translates to "Everything is going to commit or none of it is". That can lock up your target table and cause your transaction log to grow heavily depending on how much data you're loading. Use the Conditional Split as described above but for the DatesUnmatched path, induce a failure. A Derived column with divide by zero or a script task with an explicit FireError event will cause that transaction to go belly up. You'd need to do some magic in the OnError event handler to not abort the overall file processing but it's a lazy hack (or one that is useful when double reading the file is prohibitive but impacting the database is less so)

最后,你可以作弊,有时这种方法是有道理的。如果您正在使用OLE DB目标,则可以使用默认2B的MaximumInsertCommitSize和快速加载的数据访问模式。这意味着“一切都要承诺,或者一切都没有”。这可能会锁定目标表并导致事务日志严重增长,具体取决于您加载的数据量。如上所述使用条件性拆分,但对于DatesUnmatched路径,会导致失败。除数为零的派生列或带有显式FireError事件的脚本任务将导致该事务陷入困境。你需要在OnError事件处理程序中做一些魔术不要中止整个文件处理,但它是一个懒惰的黑客(或者在双重读取文件时非常有用,但对数据库的影响不大)

#1


You're looking for the Conditional Split Component within your Data Flow Task.

您正在寻找数据流任务中的条件性拆分组件。

Assuming your source column is MyDate and you have an SSIS Variable called @[User::ReferenceDate] then you'd apply an expression like

假设您的源列是MyDate并且您有一个名为@ [User :: ReferenceDate]的SSIS变量,那么您将应用一个类似的表达式

[MyDate] == @[User::ReferenceDate]

That will evaluate to True when the dates match, false otherwise.

当日期匹配时,这将评估为True,否则为false。

In your Conditional Split, add a row into the component.

在条件性拆分中,在组件中添加一行。

  • OutputName: DatesMatched
  • Condition: [MyDate] == @[User::ReferenceDate]
  • 条件:[MyDate] == @ [User :: ReferenceDate]

  • Default output name: DatesUnmatched
  • 默认输出名称:DatesUnmatched

Now when you connect the output from this to your destination, it'll ask whether you want to route the data using the DatesMatched or DatesUnmatched path. Use the DatesMatched path.

现在,当您将输出连接到目标时,它会询问您是否要使用DatesMatched或DatesUnmatched路径来路由数据。使用DatesMatched路径。

As I re-read this, if any of the dates don't match the variable, then that file and its contents are discarded then you're looking at double processing the file. The first time to read it all in and validate it. The second time, optional, will actually load to the database.

当我重新阅读它时,如果任何日期与变量不匹配,那么该文件及其内容将被丢弃,那么您正在查看对文件进行双重处理。第一次阅读全部并验证它。第二次,可选,实际上将加载到数据库。

From your Conditional Split, add a RowCount to the DatesUnmatched path. Use a Variable of type Integer/Int32 named CountDatesUnmatched. In a perfect world, that will be zero when the validation of the file completes.

从条件拆分中,将RowCount添加到DatesUnmatched路径。使用名为CountDatesUnmatched的Integer / Int32类型的变量。在一个完美的世界中,当文件验证完成时,它将为零。

In the Precedent Constraint between the Validation Data Flow and the actual Import Data Flow, double click the connector line and change the evaluation criteria from Constraint to Expression and Constraint. Leave the value as Success and in the Expression use @[User::CountDatesUnmatched] == 0 That data flow will only light up if both conditions are true: parsing was successful and no rows were sent to the Row Count component.

在验证数据流和实际导入数据流之间的先验约束中,双击连接线并将评估条件从约束更改为表达式和约束。将值保留为Success,并在Expression中使用@ [User :: CountDatesUnmatched] == 0如果两个条件都为真,则数据流只会亮起:解析成功且没有行发送到行计数组件。

Finally, you can cheat and sometimes this approach makes sense. If you're using an OLE DB Destination, then you can use the MaximumInsertCommitSize of the default 2B and a data access mode of fast load. This translates to "Everything is going to commit or none of it is". That can lock up your target table and cause your transaction log to grow heavily depending on how much data you're loading. Use the Conditional Split as described above but for the DatesUnmatched path, induce a failure. A Derived column with divide by zero or a script task with an explicit FireError event will cause that transaction to go belly up. You'd need to do some magic in the OnError event handler to not abort the overall file processing but it's a lazy hack (or one that is useful when double reading the file is prohibitive but impacting the database is less so)

最后,你可以作弊,有时这种方法是有道理的。如果您正在使用OLE DB目标,则可以使用默认2B的MaximumInsertCommitSize和快速加载的数据访问模式。这意味着“一切都要承诺,或者一切都没有”。这可能会锁定目标表并导致事务日志严重增长,具体取决于您加载的数据量。如上所述使用条件性拆分,但对于DatesUnmatched路径,会导致失败。除数为零的派生列或带有显式FireError事件的脚本任务将导致该事务陷入困境。你需要在OnError事件处理程序中做一些魔术不要中止整个文件处理,但它是一个懒惰的黑客(或者在双重读取文件时非常有用,但对数据库的影响不大)