I have a number of functions which MOVE records from one table to another (generally for a form of archiving the data) and wondered if there was a "best practice" for doing this, or a more efficient method than I am currently using.
我有许多函数可以将记录从一个表移动到另一个表(通常用于存档数据的形式),并且想知道是否存在执行此操作的“最佳实践”,或者是一种比我目前使用的更有效的方法。
At the moment, I am running something like:
目前,我正在运行如下:
INSERT INTO archive_table
SELECT [ROWID], [COL1], [COL2]
FROM live_table
WHERE <criteria>
DELETE FROM live_table
WHERE [ROWID] IN
(
SELECT [ROWID] FROM archive_table
)
This is also throwing up a warning on the SQL performance software that the query may cause index suppression and performance degradation; due to a SCAN being performed, rather than a SEEK.
这也在SQL性能软件上发出警告,该查询可能导致索引抑制和性能下降;由于执行了SCAN,而不是SEEK。
Worth adding that the archive_table is an exact copy of the live_table, with the exception that we have removed the identity and primary key off of the [ROWID] column and that this table is not used within the 'live' environment, other than having the old data inserted, as described.
值得补充的是,archive_table是live_table的精确副本,除了我们已经从[ROWID]列中删除了标识和主键,并且该表未在“实时”环境中使用,除了具有插入旧数据,如上所述。
[edit]
Would seem that the answer from Alex provides a really simple resolution to this; the comment about using a trigger doesn't resolve the issue in this instance as the event happens a number of days later and the criteria is dependant on events during that period.
似乎Alex的答案为此提供了一个非常简单的解决方案;关于使用触发器的注释不能解决此实例中的问题,因为事件发生在几天之后,并且标准取决于该时间段内的事件。
DELETE
FROM live_table
OUTPUT DELETED.* INTO archive_table
WHERE <criteria>
3 个解决方案
#1
1
If you have to move large number of records from one table to another, i suggest you check the possibility to partition your "active table". Each time, you copy data from one (or more) partitions to the "achieve table" and drop those partitions. It will be much faster than delete records from an "online" table.
如果您必须将大量记录从一个表移动到另一个表,我建议您检查是否可以对“活动表”进行分区。每次,您将数据从一个(或多个)分区复制到“实现表”并删除这些分区。它将比从“在线”表中删除记录快得多。
#2
0
Worth adding that the archive_table is an exact copy of the live_table, with the exception that we have removed the identity and primary key off of the [ROWID] column and that this table is not used within the 'live' environment, other than having the old data inserted, as described.
值得补充的是,archive_table是live_table的精确副本,除了我们已经从[ROWID]列中删除了标识和主键,并且该表未在“实时”环境中使用,除了具有插入旧数据,如上所述。
I can't tell if the reason you are removing the primary key from the archive_table is because you expect the ROWID's to be re-used in the live_table or not.
我不知道你是否从archive_table中删除主键的原因是因为你希望在live_table中重用ROWID。
If I'm understanding the context of your data correctly and that you want to archive days after the data is completed, you can improve the performance of the query by reducing/eliminating the comparison of rows that will not exist in the live_table. Basically, once a ROWID has migrated from live_table to archive_table, there is no reason to look for it again.
如果我正确理解数据的上下文并且您希望在数据完成后存档几天,则可以通过减少/消除live_table中不存在的行的比较来提高查询的性能。基本上,一旦ROWID从live_table迁移到archive_table,就没有理由再次查找它。
Note: This assumes that ROWID's are not re-used in the live_table and are always increasing numbers.
注意:这假设ROWID不会在live_table中重复使用,并且总是在增加数字。
INSERT INTO archive_table
SELECT [ROWID], [COL1], [COL2]
FROM live_table
WHERE <criteria>
DELETE FROM live_table
WHERE [ROWID] IN
(
SELECT [ROWID] FROM archive_table WHERE [ROWID] >= (SELECT MIN(ROWID) FROM live_table)
)
If ROWID's are re-used. If you have a datetime field in your data set that is close to when the record was live or archived it can be used as an alternative to the ROWID. This would mean you are only looking for recently archived rows to delete from the live_table, instead of the entire set. Also, making [somedate] the clustered index on the archive_table could improve performance as the data would be physically ordered to where you are only looking at the tail of the table.
如果重新使用ROWID。如果数据集中的日期时间字段接近于记录生存或存档的时间,则可以将其用作ROWID的替代字段。这意味着您只需要查找最近存档的行,以便从live_table中删除,而不是整个集合。此外,在archive_table上设置[somedate]聚簇索引可以提高性能,因为数据将物理地排序到您只查看表尾的位置。
INSERT INTO archive_table
SELECT [ROWID], [COL1], [COL2]
FROM live_table
WHERE <criteria>
DELETE FROM live_table
WHERE [ROWID] IN
(
SELECT [ROWID] FROM archive_table WHERE [somedate] >= DATEADD(dy,-30,GETDATE())
)
#3
0
Your code snippet does not include a named transaction which MUST be the first consideration. Second design a table variable, temp table or hard table to use as for staging. The designed table should include a column identical in datatype to the identity column from your source table and that column should be indexed. Third design your TSQL to populate the staging table, copy rows from source table to destination table based on a join between the source and staging then remove rows from the source table based on the same join that moved data to the destination table. Below is a working sample
您的代码段不包含必须首先考虑的命名事务。第二个设计一个表变量,临时表或硬表用于分段。设计的表应包含与源表中的标识列数据类型相同的列,并且该列应编入索引。第三,设计TSQL以填充登台表,根据源和登台之间的连接将行从源表复制到目标表,然后根据将数据移动到目标表的同一连接从源表中删除行。以下是一份工作样本
--test setup below
DECLARE @live_table table (rowid int identity (1,1) primary key clustered, col1 varchar(1), col2 varchar(2))
DECLARE @archive_table table (rowid int, col1 varchar(1), col2 varchar(2))
Insert @live_table (col1, col2)
Values
('a','a'),
('a','a'),
('a','a'),
('a','a'),
('b','b')
--test setup above
BEGIN Transaction MoveData
DECLARE @Staging table (ROWID int primary Key)
Insert @Staging
SELECT lt.rowid
FROM @live_table as lt
WHERE lt.col1 = 'a'
INSERT INTO @archive_table
select lt.rowid, lt.col1, lt.col2
FROM @live_table as lt
inner join @Staging as s on lt.rowid = s.ROWID
DELETE @live_table
FROM @live_table as lt
inner join @Staging as s on lt.rowid = s.ROWID
COMMIT Transaction MoveData
select * from @live_table
select * from @archive_table
select * from @Staging
#1
1
If you have to move large number of records from one table to another, i suggest you check the possibility to partition your "active table". Each time, you copy data from one (or more) partitions to the "achieve table" and drop those partitions. It will be much faster than delete records from an "online" table.
如果您必须将大量记录从一个表移动到另一个表,我建议您检查是否可以对“活动表”进行分区。每次,您将数据从一个(或多个)分区复制到“实现表”并删除这些分区。它将比从“在线”表中删除记录快得多。
#2
0
Worth adding that the archive_table is an exact copy of the live_table, with the exception that we have removed the identity and primary key off of the [ROWID] column and that this table is not used within the 'live' environment, other than having the old data inserted, as described.
值得补充的是,archive_table是live_table的精确副本,除了我们已经从[ROWID]列中删除了标识和主键,并且该表未在“实时”环境中使用,除了具有插入旧数据,如上所述。
I can't tell if the reason you are removing the primary key from the archive_table is because you expect the ROWID's to be re-used in the live_table or not.
我不知道你是否从archive_table中删除主键的原因是因为你希望在live_table中重用ROWID。
If I'm understanding the context of your data correctly and that you want to archive days after the data is completed, you can improve the performance of the query by reducing/eliminating the comparison of rows that will not exist in the live_table. Basically, once a ROWID has migrated from live_table to archive_table, there is no reason to look for it again.
如果我正确理解数据的上下文并且您希望在数据完成后存档几天,则可以通过减少/消除live_table中不存在的行的比较来提高查询的性能。基本上,一旦ROWID从live_table迁移到archive_table,就没有理由再次查找它。
Note: This assumes that ROWID's are not re-used in the live_table and are always increasing numbers.
注意:这假设ROWID不会在live_table中重复使用,并且总是在增加数字。
INSERT INTO archive_table
SELECT [ROWID], [COL1], [COL2]
FROM live_table
WHERE <criteria>
DELETE FROM live_table
WHERE [ROWID] IN
(
SELECT [ROWID] FROM archive_table WHERE [ROWID] >= (SELECT MIN(ROWID) FROM live_table)
)
If ROWID's are re-used. If you have a datetime field in your data set that is close to when the record was live or archived it can be used as an alternative to the ROWID. This would mean you are only looking for recently archived rows to delete from the live_table, instead of the entire set. Also, making [somedate] the clustered index on the archive_table could improve performance as the data would be physically ordered to where you are only looking at the tail of the table.
如果重新使用ROWID。如果数据集中的日期时间字段接近于记录生存或存档的时间,则可以将其用作ROWID的替代字段。这意味着您只需要查找最近存档的行,以便从live_table中删除,而不是整个集合。此外,在archive_table上设置[somedate]聚簇索引可以提高性能,因为数据将物理地排序到您只查看表尾的位置。
INSERT INTO archive_table
SELECT [ROWID], [COL1], [COL2]
FROM live_table
WHERE <criteria>
DELETE FROM live_table
WHERE [ROWID] IN
(
SELECT [ROWID] FROM archive_table WHERE [somedate] >= DATEADD(dy,-30,GETDATE())
)
#3
0
Your code snippet does not include a named transaction which MUST be the first consideration. Second design a table variable, temp table or hard table to use as for staging. The designed table should include a column identical in datatype to the identity column from your source table and that column should be indexed. Third design your TSQL to populate the staging table, copy rows from source table to destination table based on a join between the source and staging then remove rows from the source table based on the same join that moved data to the destination table. Below is a working sample
您的代码段不包含必须首先考虑的命名事务。第二个设计一个表变量,临时表或硬表用于分段。设计的表应包含与源表中的标识列数据类型相同的列,并且该列应编入索引。第三,设计TSQL以填充登台表,根据源和登台之间的连接将行从源表复制到目标表,然后根据将数据移动到目标表的同一连接从源表中删除行。以下是一份工作样本
--test setup below
DECLARE @live_table table (rowid int identity (1,1) primary key clustered, col1 varchar(1), col2 varchar(2))
DECLARE @archive_table table (rowid int, col1 varchar(1), col2 varchar(2))
Insert @live_table (col1, col2)
Values
('a','a'),
('a','a'),
('a','a'),
('a','a'),
('b','b')
--test setup above
BEGIN Transaction MoveData
DECLARE @Staging table (ROWID int primary Key)
Insert @Staging
SELECT lt.rowid
FROM @live_table as lt
WHERE lt.col1 = 'a'
INSERT INTO @archive_table
select lt.rowid, lt.col1, lt.col2
FROM @live_table as lt
inner join @Staging as s on lt.rowid = s.ROWID
DELETE @live_table
FROM @live_table as lt
inner join @Staging as s on lt.rowid = s.ROWID
COMMIT Transaction MoveData
select * from @live_table
select * from @archive_table
select * from @Staging