在处理海量数据时,在数据存储区中定义实体的正确方法是什么?

时间:2022-12-08 12:01:49

I'm making an instant messaging app for android and using Java and app engine for the backend.

我正在为Android制作一个即时消息应用程序,并为后端使用Java和app引擎。

To store conversations and messages in the backend, I have 2 options (as I see it) to store the data.

为了在后端存储对话和消息,我有2个选项(我认为)来存储数据。

Create 2 root entities: conversation (ID, message IDs) and message (ID, "text").

创建2个根实体:会话(ID,消息ID)和消息(ID,“文本”)。

OR

conversation(ID) message (child of conversation entity)(ID, "text")

会话(ID)消息(会话实体的子节点)(ID,“文本”)

Though technically both can work, I do not understand about the limits of the datastore (ex 1 write/sec for some entities), am worried about CPU overhead when querying, as well as having potentially millions of message root entries. I guess I am not sure if ancestral entities are required, or best for such an application.

虽然从技术上讲两者都可以工作,但我不了解数据存储区的限制(对于某些实体来说是1次写入/秒),我担心查询时的CPU开销,以及可能有数百万条消息根条目。我想我不确定是否需要祖先实体,或者最适合这样的应用程序。

tl;dr what is the best way to architect such a database?

tl; dr建立这样一个数据库的最佳方法是什么?

1 个解决方案

#1


Do not use ancestors queries Unless you are sure they fit your needs. this was to me the most confusing part about datastore because at first, parent/child seems like a great way to structure data like a tree.
In short, use them when you must have inmediate consistency when you write data. It has sevetal restrictions regarding total size and writes per second.

不要使用祖先查询除非您确定它们符合您的需求。这对我来说是关于数据存储最令人困惑的部分,因为起初,父/子似乎是一种很好的方式来构建像树一样的数据。简而言之,在编写数据时必须具有中间一致性时使用它们。它对总大小和每秒写入数有一些限制。

dont worry about having millions of "root" entities. This is precisely what the datastore (and nosql in general) is good about.
all datastore queries are efficient, it wont even let you run one that it isnt (so you must add all needed indexes beforehand) thus dont worry about query performance unless you cant express the query with an index.
in your case, given that a conversation shouldnt be huge and users normally dont type more than 5 entries per second, you could use ancestors and you will gain inmediate consistency within each conversation.
At this point i think its too broad to ask for the arquitecture but this should point you the right way.

不要担心拥有数百万“根”实体。这正是数据存储区(以及一般的nosql)的优点。所有数据存储区查询都是高效的,它甚至不会让你运行一个它不是(所以你必须事先添加所有需要的索引),因此不要担心查询性能,除非你不能用索引表达查询。在你的情况下,鉴于一个对话不应该是巨大的,用户通常不会每秒输入超过5个条目,你可以使用祖先,你将在每个对话中获得中等一致性。在这一点上,我认为它太宽泛,不能要求建筑,但这应该指出你正确的方式。

#1


Do not use ancestors queries Unless you are sure they fit your needs. this was to me the most confusing part about datastore because at first, parent/child seems like a great way to structure data like a tree.
In short, use them when you must have inmediate consistency when you write data. It has sevetal restrictions regarding total size and writes per second.

不要使用祖先查询除非您确定它们符合您的需求。这对我来说是关于数据存储最令人困惑的部分,因为起初,父/子似乎是一种很好的方式来构建像树一样的数据。简而言之,在编写数据时必须具有中间一致性时使用它们。它对总大小和每秒写入数有一些限制。

dont worry about having millions of "root" entities. This is precisely what the datastore (and nosql in general) is good about.
all datastore queries are efficient, it wont even let you run one that it isnt (so you must add all needed indexes beforehand) thus dont worry about query performance unless you cant express the query with an index.
in your case, given that a conversation shouldnt be huge and users normally dont type more than 5 entries per second, you could use ancestors and you will gain inmediate consistency within each conversation.
At this point i think its too broad to ask for the arquitecture but this should point you the right way.

不要担心拥有数百万“根”实体。这正是数据存储区(以及一般的nosql)的优点。所有数据存储区查询都是高效的,它甚至不会让你运行一个它不是(所以你必须事先添加所有需要的索引),因此不要担心查询性能,除非你不能用索引表达查询。在你的情况下,鉴于一个对话不应该是巨大的,用户通常不会每秒输入超过5个条目,你可以使用祖先,你将在每个对话中获得中等一致性。在这一点上,我认为它太宽泛,不能要求建筑,但这应该指出你正确的方式。