如何在SQL Server上处理季节性数据库值的归档

时间:2022-09-01 19:52:14

I am on SQL Server 2008 R2 and I am currently developing a database structure which contains seasonal values for some products.

我在SQL Server 2008 R2上,目前我正在开发一个包含某些产品的季节性值的数据库结构。

By seasonal I mean that those values won't be useful after a particular date in terms of customer use. But, those values will be used for statistical results by internal stuff.

按季节,我的意思是这些值在客户使用方面的特定日期之后将无用。但是,这些值将用于内部资料的统计结果。

On the sales web site, we will add a feature for product search and one of my aim is to make this search as optimized as possible. But, more row inside the database table, less fast this search will become. So, I consider archiving the unused values.

在销售网站上,我们将添加产品搜索功能,我的目标之一是尽可能优化此搜索。但是,数据库表中的更多行,此搜索将变得更快。所以,我考虑归档未使用的值。

I can handle auto archiving with SQL Server Jobs automatically. No problem there. But I am not sure how I should archive those values.

我可以自动处理SQL Server作业的自动存档。没问题。但我不确定如何存档这些值。

Best way I can come up with is that I create another table inside the same database with same columns and put them there.

我能想出的最好方法是在同一个数据库中创建另一个具有相同列的表并将它们放在那里。

Example :

示例:

My main table name is ProductPrices and there a primary key has been defined for this database. Then, I have created another table named ProdutcPrices_archive. I created a primary key field for this table as well and the same columns as ProductPrices table except for ProdutPrices primary key value. I don't think it is useful to archive that value (do I think correct?).

我的主表名是ProductPrices,并且已为此数据库定义了主键。然后,我创建了另一个名为ProdutcPrices_archive的表。我为此表创建了一个主键字段,并为ProductPrices表创建了与ProdutPrices主键值不同的列。我认为存档该值并不重要(我认为是否正确?)。

For the internal use, I consider putting two table values together with UNION (Is that the correct way?).

对于内部使用,我考虑将两个表值与UNION放在一起(这是正确的方法吗?)。

This database is meant to use for long time and it should be designed with best structure. I am not sure if I miss something here for the long run.

该数据库需要长时间使用,并且应该设计为具有最佳结构。我不确定从长远来看我是否会遗漏这些东西。

Any advice would be appreciated.

任何意见,将不胜感激。

2 个解决方案

#1


4  

I'd consider one of two options initially

我最初会考虑两种选择之一

  • Use partitioning to separate the single table into current working set and archive data.
    No need to use an archive table

    使用分区将单个表分隔为当前工作集和归档数据。无需使用存档表

  • Add validForm, ValidTo columns to implement a type 2 SCD
    Then add an indexed view for ValidTo IS NULL to get the current set of data

    添加validForm,ValidTo列以实现类型2 SCD然后为ValidTo IS NULL添加索引视图以获取当前数据集

I wouldn't have 2 separate tables if all data has to be "on-line" in one database.

如果所有数据必须在一个数据库中“在线”,我就不会有2个单独的表。

This leads to a 3rd option: an entirely separate database with all data. Only "current" data stays in live. (as @Mike_Walsh's answer explains)

这导致了第三种选择:一个包含所有数据的完全独立的数据库。只有“当前”数据才会保留。 (正如@ Mike_Walsh的回答所解释的那样)

The indexed view option is easiest and works with standard edition (with NOEXPAND hint)

索引视图选项最简单,适用于标准版(带NOEXPAND提示)

#2


1  

gbn brings up some good approaches. I think the "right" longer term answer for you is the t3rd option, though.

gbn提出了一些好的方法。我认为对你而言,“正确”的长期答案是t3rd选项。

It sounds like you have two business use cases of your data -

听起来你有两个数据业务用例 -

1.) Real time Online Transaction Processing (OLTP). This is the POS transactions, inventory management, quick "how did receipts look today, how is inventory, are we having any operational problems?" kind of questions and keeps the business running day to day. Here you want the data necessary to conduct operations and you want a database optimized for updates/inserts/etc.

1.)实时在线事务处理(OLTP)。这是POS交易,库存管理,快速“今天的收据如何看,库存如何,我们是否有任何操作问题?”一些问题,让业务日复一日地运行。在这里,您需要执行操作所需的数据,并且您希望为更新/插入/等优化数据库。

2.) Analytical type questions/Reporting. This is looking at month over month numbers, year over year numbers, running averages. These are the questions that you ask as that are strategic and look at a complete picture of your history - You'll want to see how last years Christmas seasonal items did against this years, maybe even compare those numbers with the seasonal items from that same period 5 years ago. Here you want a database that contains a lot more data than your OLTP. You want to throw away as little history as possible and you want a database largely optimized for reporting and answering questions. Probably more denormalized. You want the ability to see things as they were at a certain time, so the Type 2 SCDs mentioned by gbn would be useful here.

2.)分析类型问题/报告。这是关注月月数,年度数,运行平均数。这些是您提出的具有战略意义的问题,并且可以全面了解您的历史记录 - 您将希望了解去年圣诞节季节性项目与今年相比的情况,甚至可以将这些数字与季节性项目进行比较。 5年前的时期。在这里,您需要一个包含比OLTP更多数据的数据库。您希望尽可能少地丢弃历史记录,并且您希望数据库在很大程度上针对报告和回答问题进行优化。可能更加非规范化。你希望能够在某个时间看到事物,因此gbn提到的Type 2 SCD在这里很有用。

It sounds to me like you need to create a reporting database. You can call it a data warehouse, but that term scares people these days. Doesn't need to be scary, if you plan it properly it doesn't have to take you 6 years and 6 million dollars to make ;-)

听起来像你需要创建一个报告数据库。您可以将其称为数据仓库,但这一天这个术语让人们感到害怕。不需要吓人,如果你正确计划它不需要花费你6年和600万美元;-)

This is definitely a longer term answer but in a couple years you'll be happy you spent the time creating one. A good book to understand the concept of dimensional modeling and thinking about data warehouses and their terminology is The Data Warehouse Toolkit.

这绝对是一个长期的答案,但在几年后你会很高兴你花时间创造一个。理解维度建模概念和思考数据仓库及其术语的好书是数据仓库工具包。

#1


4  

I'd consider one of two options initially

我最初会考虑两种选择之一

  • Use partitioning to separate the single table into current working set and archive data.
    No need to use an archive table

    使用分区将单个表分隔为当前工作集和归档数据。无需使用存档表

  • Add validForm, ValidTo columns to implement a type 2 SCD
    Then add an indexed view for ValidTo IS NULL to get the current set of data

    添加validForm,ValidTo列以实现类型2 SCD然后为ValidTo IS NULL添加索引视图以获取当前数据集

I wouldn't have 2 separate tables if all data has to be "on-line" in one database.

如果所有数据必须在一个数据库中“在线”,我就不会有2个单独的表。

This leads to a 3rd option: an entirely separate database with all data. Only "current" data stays in live. (as @Mike_Walsh's answer explains)

这导致了第三种选择:一个包含所有数据的完全独立的数据库。只有“当前”数据才会保留。 (正如@ Mike_Walsh的回答所解释的那样)

The indexed view option is easiest and works with standard edition (with NOEXPAND hint)

索引视图选项最简单,适用于标准版(带NOEXPAND提示)

#2


1  

gbn brings up some good approaches. I think the "right" longer term answer for you is the t3rd option, though.

gbn提出了一些好的方法。我认为对你而言,“正确”的长期答案是t3rd选项。

It sounds like you have two business use cases of your data -

听起来你有两个数据业务用例 -

1.) Real time Online Transaction Processing (OLTP). This is the POS transactions, inventory management, quick "how did receipts look today, how is inventory, are we having any operational problems?" kind of questions and keeps the business running day to day. Here you want the data necessary to conduct operations and you want a database optimized for updates/inserts/etc.

1.)实时在线事务处理(OLTP)。这是POS交易,库存管理,快速“今天的收据如何看,库存如何,我们是否有任何操作问题?”一些问题,让业务日复一日地运行。在这里,您需要执行操作所需的数据,并且您希望为更新/插入/等优化数据库。

2.) Analytical type questions/Reporting. This is looking at month over month numbers, year over year numbers, running averages. These are the questions that you ask as that are strategic and look at a complete picture of your history - You'll want to see how last years Christmas seasonal items did against this years, maybe even compare those numbers with the seasonal items from that same period 5 years ago. Here you want a database that contains a lot more data than your OLTP. You want to throw away as little history as possible and you want a database largely optimized for reporting and answering questions. Probably more denormalized. You want the ability to see things as they were at a certain time, so the Type 2 SCDs mentioned by gbn would be useful here.

2.)分析类型问题/报告。这是关注月月数,年度数,运行平均数。这些是您提出的具有战略意义的问题,并且可以全面了解您的历史记录 - 您将希望了解去年圣诞节季节性项目与今年相比的情况,甚至可以将这些数字与季节性项目进行比较。 5年前的时期。在这里,您需要一个包含比OLTP更多数据的数据库。您希望尽可能少地丢弃历史记录,并且您希望数据库在很大程度上针对报告和回答问题进行优化。可能更加非规范化。你希望能够在某个时间看到事物,因此gbn提到的Type 2 SCD在这里很有用。

It sounds to me like you need to create a reporting database. You can call it a data warehouse, but that term scares people these days. Doesn't need to be scary, if you plan it properly it doesn't have to take you 6 years and 6 million dollars to make ;-)

听起来像你需要创建一个报告数据库。您可以将其称为数据仓库,但这一天这个术语让人们感到害怕。不需要吓人,如果你正确计划它不需要花费你6年和600万美元;-)

This is definitely a longer term answer but in a couple years you'll be happy you spent the time creating one. A good book to understand the concept of dimensional modeling and thinking about data warehouses and their terminology is The Data Warehouse Toolkit.

这绝对是一个长期的答案,但在几年后你会很高兴你花时间创造一个。理解维度建模概念和思考数据仓库及其术语的好书是数据仓库工具包。