如何使用大型数据集ssis处理增量负载

时间:2023-01-28 23:46:14

I have 2 tables (~ 4 million rows) that I have to do insert/update actions on matching and unmatching records. I am pretty confused about the method I have to use for incremental load. Should I use Lookup component or new sql server merge statement? and will there be too much performance differences?

我有2个表(约400万行),我必须对匹配和不匹配的记录执行插入/更新操作。我对用于增量加载的方法感到很困惑。我应该使用Lookup组件还是新的sql server merge语句?并且会有太多的性能差异吗?

3 个解决方案

#1


I've run into this exact problem a few times and I've always had to resort to loading the complete dataset into SQLserver via ETL, and then manipulating with stored procs. It always seemed to take way, way too long updating the data on the fly in SSIS transforms.

我已经遇到过几次这个确切的问题而且我总是不得不求助于通过ETL将完整的数据集加载到SQLserver中,然后使用存储过程进行操作。它似乎总是让位,在SSIS转换中动态更新数据的方式太长。

#2


The SSIS Lookup has three caching modes which are key to getting the best performance from it. If you are looking up against a large table, FULL Cache mode will eat up a lot of your memory and could hinder performance. If your lookup destination is small, keep it in memory. You've also got to decide if the data you are looking up against is changing as you process data. If it is, then you don't want to cache.

SSIS Lookup有三种缓存模式,这对于从中获得最佳性能至关重要。如果您正在查看大型表,FULL Cache模式会占用大量内存并可能会影响性能。如果您的查找目标很小,请将其保留在内存中。您还需要确定您正在查找的数据是否在处理数据时发生变化。如果是,那么你不想缓存。

Can you give us some more info on what you are oding so I can formulate a more precise answer.

你能给我们一些关于你是什么的更多信息,这样我就可以制定一个更精确的答案。

#3


Premature optimization is the root of all evil, I don't know about ssis, but it's always to early to think about this.

过早的优化是所有邪恶的根源,我不知道ssis,但它总是要尽早考虑这个。

4 million rows could be "large" or "small", depending on the type of data, and the hardware configuration you're using.

400万行可能是“大”或“小”,具体取决于数据类型和您正在使用的硬件配置。

#1


I've run into this exact problem a few times and I've always had to resort to loading the complete dataset into SQLserver via ETL, and then manipulating with stored procs. It always seemed to take way, way too long updating the data on the fly in SSIS transforms.

我已经遇到过几次这个确切的问题而且我总是不得不求助于通过ETL将完整的数据集加载到SQLserver中,然后使用存储过程进行操作。它似乎总是让位,在SSIS转换中动态更新数据的方式太长。

#2


The SSIS Lookup has three caching modes which are key to getting the best performance from it. If you are looking up against a large table, FULL Cache mode will eat up a lot of your memory and could hinder performance. If your lookup destination is small, keep it in memory. You've also got to decide if the data you are looking up against is changing as you process data. If it is, then you don't want to cache.

SSIS Lookup有三种缓存模式,这对于从中获得最佳性能至关重要。如果您正在查看大型表,FULL Cache模式会占用大量内存并可能会影响性能。如果您的查找目标很小,请将其保留在内存中。您还需要确定您正在查找的数据是否在处理数据时发生变化。如果是,那么你不想缓存。

Can you give us some more info on what you are oding so I can formulate a more precise answer.

你能给我们一些关于你是什么的更多信息,这样我就可以制定一个更精确的答案。

#3


Premature optimization is the root of all evil, I don't know about ssis, but it's always to early to think about this.

过早的优化是所有邪恶的根源,我不知道ssis,但它总是要尽早考虑这个。

4 million rows could be "large" or "small", depending on the type of data, and the hardware configuration you're using.

400万行可能是“大”或“小”,具体取决于数据类型和您正在使用的硬件配置。