Google App Engine中的数据存储区与云端SQL

时间:2022-08-22 11:15:20

I want to build an application that will serve a lot of people (more than 2 million) so I think that I should use Google Cloud Datastore. However I also know that there is an option to use Google Cloud SQL and still serve a lot of people using mySQL (like what Facebook and Youtube do).

Is this a correct assumption to use Datastore rather that the relational Cloud SQL with this many users? Thank you in advance

我想构建一个服务于很多人(超过200万)的应用程序,所以我认为我应该使用Google Cloud Datastore。但是我也知道有一个选项可以使用Google Cloud SQL,但仍然可以为很多人使用mySQL服务(就像Facebook和Youtube所做的那样)。这是一个正确的假设,使用数据存储而不是关系云SQL与这么多用户?先谢谢你

3 个解决方案

#1


17  

It is not strictly true that Facebook and YouTube are using MySQL to serve the majority of their content to the majority of their users. They both mainly use very large NoSQL stores (Cassandra and BigTable) for scalability, and probably use MySQL for smaller scale work that demands more complex relational storage. Try to use Datastore if you can, because you can start for free and will also save money when handling large volumes of data.

Facebook和YouTube使用MySQL向大多数用户提供大部分内容并不严格。它们主要使用非常大的NoSQL存储(Cassandra和BigTable)来实现可伸缩性,并且可能将MySQL用于需要更复杂的关系存储的小规模工作。如果可以的话,尝试使用数据存储,因为您可以免费启动,并且在处理大量数据时也可以节省资金。

#2


29  

To give an intelligent answer, I would need to know a lot more about your app. But... I'll outline the biggest gotchas I've found...

为了给出一个明智的答案,我需要了解更多有关您的应用程序的信息。但是......我将概述我发现的最大问题......

Google Datastore is effectively a distributed hierarchical data store. To get the scalability they wanted there had to be some compromises. As a developer you will find that these are anywhere from easy to work around, difficult to work around, or impossible to work around. The latter is far more likely than you would ever assume.

Google Datastore实际上是一个分布式分层数据存储。为了获得他们想要的可扩展性,必须有一些妥协。作为开发人员,您会发现这些都是易于解决,难以解决或无法解决的问题。后者远比你想象的更可能。

If you are accustomed to relational databases and the ability to manipulate data across multiple tables within the same transaction, you are likely to pull your hair out with datastore. The biggest(?) gotcha is that transactions are only supported across a limited number of entity groups (5 at the current time). To give a simple example, say you had a simple parent-child relationship and you needed to update child records under more than 5 parents at the same time within a transaction... can't be done (yes, really). If you reorganize your data structures and try to put all of the former child records under a single entity so they can be updated in a single transaction, you will come across another limitation... the fact that you can't reliably update the same entity group more than once per second (yes, really). And if you query an entity type across parents without specifying the root entity of each, you will get what is euphemistically referred to as "eventual consistency"... which means it isn't (yes, really).

如果您习惯于关系数据库以及在同一事务中跨多个表操作数据的能力,那么您很可能会使用数据存储区来解决问题。最大的(?)问题是只在有限数量的实体组中支持交易(当前时间为5)。举一个简单的例子,假设您有一个简单的父子关系,并且您需要在事务中同时更新5个以上父母的子记录...无法完成(是的,真的)。如果您重新组织数据结构并尝试将所有以前的子记录放在单个实体中,以便可以在单个事务中更新它们,您将遇到另一个限制...您无法可靠地更新相同的事实实体组每秒超过一次(是的,真的)。如果您在不指定每个实体类型的根实体的情况下查询父项的实体类型,您将获得委婉地称为“最终一致性”的内容......这意味着它不是(是的,真的)。

The above is all in Google's documentation, but you are likely to gloss over it if you are just getting started (of course it can handle it!).

以上内容都在Google的文档中,但如果您刚开始使用它(当然它可以处理它!),您可能会对它进行掩饰。

#3


9  

It depends on what you mean by 'a lot of people', what sort of data you have, and what you want to do with it.

这取决于你对“很多人”的意思,你拥有什么样的数据,以及你想用它做什么。

Cloud SQL is designed for applications that need a SQL database, which can handle any query you can write in SQL, and ensures your data is always in a consistent state.

Cloud SQL专为需要SQL数据库的应用程序而设计,该数据库可以处理您可以在SQL中编写的任何查询,并确保您的数据始终处于一致状态。

Cloud SQL can serve up to 3200 concurrent queries, depending on the tier. If the queries are simple and can be served from RAM they should take just a few ms, and assuming your users issue about 1 request per second, then it could support tens of thousands of simultaneously active users. If, however, they are doing more complex queries like searches, or writing a lot of data, then it will be less.

Cloud SQL最多可以提供3200个并发查询,具体取决于层。如果查询很简单并且可以从RAM提供,它们应该只需几毫秒,并假设您的用户每秒发出大约1个请求,那么它可以支持数万个同时活跃的用户。但是,如果他们正在进行更复杂的查询,例如搜索或编写大量数据,那么它就会更少。

If you have a simple set of queries, are less concerned about immediate consistency, or expect much more traffic, then you should look at datastore.

如果您有一组简单的查询,不太关心直接一致性或期望更多流量,那么您应该查看数据存储区。

#1


17  

It is not strictly true that Facebook and YouTube are using MySQL to serve the majority of their content to the majority of their users. They both mainly use very large NoSQL stores (Cassandra and BigTable) for scalability, and probably use MySQL for smaller scale work that demands more complex relational storage. Try to use Datastore if you can, because you can start for free and will also save money when handling large volumes of data.

Facebook和YouTube使用MySQL向大多数用户提供大部分内容并不严格。它们主要使用非常大的NoSQL存储(Cassandra和BigTable)来实现可伸缩性,并且可能将MySQL用于需要更复杂的关系存储的小规模工作。如果可以的话,尝试使用数据存储,因为您可以免费启动,并且在处理大量数据时也可以节省资金。

#2


29  

To give an intelligent answer, I would need to know a lot more about your app. But... I'll outline the biggest gotchas I've found...

为了给出一个明智的答案,我需要了解更多有关您的应用程序的信息。但是......我将概述我发现的最大问题......

Google Datastore is effectively a distributed hierarchical data store. To get the scalability they wanted there had to be some compromises. As a developer you will find that these are anywhere from easy to work around, difficult to work around, or impossible to work around. The latter is far more likely than you would ever assume.

Google Datastore实际上是一个分布式分层数据存储。为了获得他们想要的可扩展性,必须有一些妥协。作为开发人员,您会发现这些都是易于解决,难以解决或无法解决的问题。后者远比你想象的更可能。

If you are accustomed to relational databases and the ability to manipulate data across multiple tables within the same transaction, you are likely to pull your hair out with datastore. The biggest(?) gotcha is that transactions are only supported across a limited number of entity groups (5 at the current time). To give a simple example, say you had a simple parent-child relationship and you needed to update child records under more than 5 parents at the same time within a transaction... can't be done (yes, really). If you reorganize your data structures and try to put all of the former child records under a single entity so they can be updated in a single transaction, you will come across another limitation... the fact that you can't reliably update the same entity group more than once per second (yes, really). And if you query an entity type across parents without specifying the root entity of each, you will get what is euphemistically referred to as "eventual consistency"... which means it isn't (yes, really).

如果您习惯于关系数据库以及在同一事务中跨多个表操作数据的能力,那么您很可能会使用数据存储区来解决问题。最大的(?)问题是只在有限数量的实体组中支持交易(当前时间为5)。举一个简单的例子,假设您有一个简单的父子关系,并且您需要在事务中同时更新5个以上父母的子记录...无法完成(是的,真的)。如果您重新组织数据结构并尝试将所有以前的子记录放在单个实体中,以便可以在单个事务中更新它们,您将遇到另一个限制...您无法可靠地更新相同的事实实体组每秒超过一次(是的,真的)。如果您在不指定每个实体类型的根实体的情况下查询父项的实体类型,您将获得委婉地称为“最终一致性”的内容......这意味着它不是(是的,真的)。

The above is all in Google's documentation, but you are likely to gloss over it if you are just getting started (of course it can handle it!).

以上内容都在Google的文档中,但如果您刚开始使用它(当然它可以处理它!),您可能会对它进行掩饰。

#3


9  

It depends on what you mean by 'a lot of people', what sort of data you have, and what you want to do with it.

这取决于你对“很多人”的意思,你拥有什么样的数据,以及你想用它做什么。

Cloud SQL is designed for applications that need a SQL database, which can handle any query you can write in SQL, and ensures your data is always in a consistent state.

Cloud SQL专为需要SQL数据库的应用程序而设计,该数据库可以处理您可以在SQL中编写的任何查询,并确保您的数据始终处于一致状态。

Cloud SQL can serve up to 3200 concurrent queries, depending on the tier. If the queries are simple and can be served from RAM they should take just a few ms, and assuming your users issue about 1 request per second, then it could support tens of thousands of simultaneously active users. If, however, they are doing more complex queries like searches, or writing a lot of data, then it will be less.

Cloud SQL最多可以提供3200个并发查询,具体取决于层。如果查询很简单并且可以从RAM提供,它们应该只需几毫秒,并假设您的用户每秒发出大约1个请求,那么它可以支持数万个同时活跃的用户。但是,如果他们正在进行更复杂的查询,例如搜索或编写大量数据,那么它就会更少。

If you have a simple set of queries, are less concerned about immediate consistency, or expect much more traffic, then you should look at datastore.

如果您有一组简单的查询,不太关心直接一致性或期望更多流量,那么您应该查看数据存储区。