Is indexing a boolean or enum-like property in Datastore a bad idea for fast writes?

时间:2022-11-21 15:50:10

It is well documented that fast writing into an entity kind with monotonically increasing values as key or indexed properties is a bad idea for performance.

有充分证据表明,快速写入具有单调递增值作为关键或索引属性的实体类型对于性能来说是一个坏主意。

How about indexing the entities on boolean properties or properties with enum-like values such as Genders?

如何使用类似枚举的值(如Genders)索引布尔属性或属性上的实体?

My guess is indexing on a low-cardinality property will probably suffer from the same problem, because there is no built-in type for such properties. But maybe there is special treatment for boolean properties?

我的猜测是对低基数属性进行索引可能会遇到同样的问题,因为这些属性没有内置类型。但也许对布尔属性有特殊处理?

1 个解决方案

#1


3  

Cloud Datastore has optimizations in place for low-cardinality data such as booleans and enums. Each index entry also contains the entity key, which can then allow our underlying Bigtable tablets to efficiently split and hence handle larger load. This works since we don't need to consider sort order for the same value, so having them randomly distributed within their own key space makes no difference to queries, and the entity key is guaranteed to be unique so we avoid collisions.

Cloud Datastore针对低基数数据(如布尔值和枚举)进行了优化。每个索引条目还包含实体密钥,然后可以允许我们的基础Bigtable平板电脑有效地分割,从而处理更大的负载。这是有效的,因为我们不需要考虑相同值的排序顺序,因此将它们随机分布在它们自己的密钥空间内对查询没有任何影响,并且实体密钥保证是唯一的,因此我们避免冲突。

When we index a value we also add a 'scatter key' property to the end, which is essentially a randomized integer. This scatter key can then be used for query splitting later, allowing things like Cloud Dataflow to efficiently parallelize queries against this dataset.

当我们索引一个值时,我们还在末尾添加一个'scatter key'属性,它实质上是一个随机整数。然后,此分散键可用于稍后的查询拆分,从而允许Cloud Dataflow等对此数据集有效地并行化查询。

#1


3  

Cloud Datastore has optimizations in place for low-cardinality data such as booleans and enums. Each index entry also contains the entity key, which can then allow our underlying Bigtable tablets to efficiently split and hence handle larger load. This works since we don't need to consider sort order for the same value, so having them randomly distributed within their own key space makes no difference to queries, and the entity key is guaranteed to be unique so we avoid collisions.

Cloud Datastore针对低基数数据(如布尔值和枚举)进行了优化。每个索引条目还包含实体密钥,然后可以允许我们的基础Bigtable平板电脑有效地分割,从而处理更大的负载。这是有效的,因为我们不需要考虑相同值的排序顺序,因此将它们随机分布在它们自己的密钥空间内对查询没有任何影响,并且实体密钥保证是唯一的,因此我们避免冲突。

When we index a value we also add a 'scatter key' property to the end, which is essentially a randomized integer. This scatter key can then be used for query splitting later, allowing things like Cloud Dataflow to efficiently parallelize queries against this dataset.

当我们索引一个值时,我们还在末尾添加一个'scatter key'属性,它实质上是一个随机整数。然后,此分散键可用于稍后的查询拆分,从而允许Cloud Dataflow等对此数据集有效地并行化查询。