锚点数据库建模 - 有没有更好的存储历史记录并允许回滚记录?

时间:2022-09-11 14:54:27

I have red about anchor modelling from http://www.anchormodeling.com/ - there are a lot of publications that made sense to me. I am very concerned about the performance though... storing so many records in a property table and always working with the most recent one should drain memory and processor speed. The authors claim that this is not the case though.... Is there any better modelling technique to store history and allow roll-back of the records?

我对http://www.anchormodeling.com/的锚建模很感兴趣 - 有很多出版物对我有意义。我非常关注性能......虽然在属性表中存储了如此多的记录并且总是使用最新的记录会耗尽内存和处理器速度。作者声称虽然情况并非如此......是否有更好的建模技术来存储历史记录并允许回滚记录?

1 个解决方案

#1


0  

Normally querying data in an Anchor modeled database falls within two categories:

通常,在Anchor建模数据库中查询数据分为两类:

  1. OLTP-like queries, retrieving a large number of attributes using high selectivity conditions
  2. 类似OLTP的查询,使用高选择性条件检索大量属性
  3. OLAP-like queries, retrieving a small number of attributes using low selectivity conditions
  4. 类似OLAP的查询,使用低选择性条件检索少量属性

In (1) the high selectivity, often constraining the results to that belonging to a single instance, will quickly pinpoint the desired instance followed by a small overhead due to the joins involved. The joins are however made on declared PK/FK relations on over tables already sorted by the single integer key corresponding to the identity of the instance. In other words, in a 6NF model (which is what provides the most temporal features), it is not possible to create a physical implementation that would perform better. As a case example, the Swedish insurance company Länsförsäkringar has been running a real-time master data management system using Anchor since 2005, containing about 10 million engagements for 3 million customers, without performance issues. That being said, if extremely many queries are going to be run in parallel, the added overhead may become an issue.

在(1)中,高选择性(通常将结果限制为属于单个实例的结果)将快速查明所需的实例,然后由于所涉及的连接而产生小的开销。但是,对已经按照与实例标识对应的单个整数键排序的表上的已声明PK / FK关系进行连接。换句话说,在6NF模型(提供最具时间特征的模型)中,不可能创建性能更好的物理实现。作为一个案例,瑞典保险公司Länsförsäkringar自2005年以来一直使用Anchor运行实时主数据管理系统,包含约300万个客户,300万客户,没有性能问题。话虽如此,如果要并行运行极多的查询,增加的开销可能会成为一个问题。

In (2) since you are retrieving a small number of attributes the number of joins are reduced. In addition, the selectivity introduced by conditions make the joins behave like indexes (provided you have cost based optimizer that use column statistics). An optimal join order will be produced using the most selective condition first, so that intermediate result sets become as small as possible as early as possible with respect to the involved joins. As an additional benefit, the 6NF structure in Anchor maps directly onto distribution mechanisms in massively parallel processing relational databases, providing the best possible distribution for ad-hoc querying. As a case example, avito.ru has a 55TB data warehouse built using Anchor on a 12 node Vertica cluster, running without performance issues. In fact, this solution outperformed many of the other solutions they tested, including NoSQL alternatives.

在(2)中,因为您正在检索少量属性,所以减少了连接数。此外,条件引入的选择性使得连接的行为类似于索引(前提是您具有使用列统计信息的基于成本的优化器)。首先使用最具选择性的条件产生最佳连接顺序,使得中间结果集尽可能早地相对于所涉及的连接变得尽可能小。作为额外的好处,Anchor中的6NF结构直接映射到大规模并行处理关系数据库中的分发机制,为ad-hoc查询提供尽可能好的分布。作为一个案例,avito.ru在12节点Vertica集群上使用Anchor构建了一个55TB数据仓库,运行时没有出现性能问题。实际上,该解决方案的性能优于他们测试的许多其他解决方案,包括NoSQL替代方案。

As a conclusion, I would say that you cannot find a better modeling technique if you need to support temporality and flexibility. I have to point out though that I am one of the authors of the technique, although what I have said has been proven both in practice and theory, with scientific papers to back up the claims.

作为结论,我想说如果你需要支持时间性和灵活性,你就找不到更好的建模技术。我必须指出,尽管我是这项技术的作者之一,尽管我所说的已经在实践和理论中得到证实,科学论文支持这些说法。

#1


0  

Normally querying data in an Anchor modeled database falls within two categories:

通常,在Anchor建模数据库中查询数据分为两类:

  1. OLTP-like queries, retrieving a large number of attributes using high selectivity conditions
  2. 类似OLTP的查询,使用高选择性条件检索大量属性
  3. OLAP-like queries, retrieving a small number of attributes using low selectivity conditions
  4. 类似OLAP的查询,使用低选择性条件检索少量属性

In (1) the high selectivity, often constraining the results to that belonging to a single instance, will quickly pinpoint the desired instance followed by a small overhead due to the joins involved. The joins are however made on declared PK/FK relations on over tables already sorted by the single integer key corresponding to the identity of the instance. In other words, in a 6NF model (which is what provides the most temporal features), it is not possible to create a physical implementation that would perform better. As a case example, the Swedish insurance company Länsförsäkringar has been running a real-time master data management system using Anchor since 2005, containing about 10 million engagements for 3 million customers, without performance issues. That being said, if extremely many queries are going to be run in parallel, the added overhead may become an issue.

在(1)中,高选择性(通常将结果限制为属于单个实例的结果)将快速查明所需的实例,然后由于所涉及的连接而产生小的开销。但是,对已经按照与实例标识对应的单个整数键排序的表上的已声明PK / FK关系进行连接。换句话说,在6NF模型(提供最具时间特征的模型)中,不可能创建性能更好的物理实现。作为一个案例,瑞典保险公司Länsförsäkringar自2005年以来一直使用Anchor运行实时主数据管理系统,包含约300万个客户,300万客户,没有性能问题。话虽如此,如果要并行运行极多的查询,增加的开销可能会成为一个问题。

In (2) since you are retrieving a small number of attributes the number of joins are reduced. In addition, the selectivity introduced by conditions make the joins behave like indexes (provided you have cost based optimizer that use column statistics). An optimal join order will be produced using the most selective condition first, so that intermediate result sets become as small as possible as early as possible with respect to the involved joins. As an additional benefit, the 6NF structure in Anchor maps directly onto distribution mechanisms in massively parallel processing relational databases, providing the best possible distribution for ad-hoc querying. As a case example, avito.ru has a 55TB data warehouse built using Anchor on a 12 node Vertica cluster, running without performance issues. In fact, this solution outperformed many of the other solutions they tested, including NoSQL alternatives.

在(2)中,因为您正在检索少量属性,所以减少了连接数。此外,条件引入的选择性使得连接的行为类似于索引(前提是您具有使用列统计信息的基于成本的优化器)。首先使用最具选择性的条件产生最佳连接顺序,使得中间结果集尽可能早地相对于所涉及的连接变得尽可能小。作为额外的好处,Anchor中的6NF结构直接映射到大规模并行处理关系数据库中的分发机制,为ad-hoc查询提供尽可能好的分布。作为一个案例,avito.ru在12节点Vertica集群上使用Anchor构建了一个55TB数据仓库,运行时没有出现性能问题。实际上,该解决方案的性能优于他们测试的许多其他解决方案,包括NoSQL替代方案。

As a conclusion, I would say that you cannot find a better modeling technique if you need to support temporality and flexibility. I have to point out though that I am one of the authors of the technique, although what I have said has been proven both in practice and theory, with scientific papers to back up the claims.

作为结论,我想说如果你需要支持时间性和灵活性,你就找不到更好的建模技术。我必须指出,尽管我是这项技术的作者之一,尽管我所说的已经在实践和理论中得到证实,科学论文支持这些说法。