如何扩展NodeJS的有状态应用程序?

I am currently working on a web-based MMORPG game and would like to setup an auto-scaling strategy based on Docker and DigitalOcean droplets.

我目前正在开发一个基于web的MMORPG游戏，我想建立一个基于Docker和数字海洋水滴的自动缩放策略。

However, I am wondering how I could manage to do so:

然而，我想知道我该如何做到这一点:

My game server would have to be splittable across different Docker containers BUT every game server instance should act as if it was only one gigantic game server. That means that every modification happening in one (character moving) should also be mirrored in every other game server.

我的游戏服务器必须可以在不同的Docker容器之间拆分，但是每个游戏服务器实例都应该表现得好像它只是一个巨大的游戏服务器。这意味着在一个(角色移动)中发生的每一个修改都应该被镜像到其他的游戏服务器中。

I am trying to get this to work (at least conceptually) but can't find a way to synchronize all my instances properly. Should I use a master only broadcasting events or is there an alternative?

我正在尝试让它工作(至少在概念上)，但是找不到一种方法来正确地同步所有实例。我应该只使用主广播事件还是有其他选择?

I was wondering the same thing about my MySQL database: since every game server would have to read/write from/to the db, how would I make it scale properly as the game gets bigger and bigger? The best solution I could think of was to keep the database on a single server which would be very powerful.

我对MySQL数据库也有同样的疑问:因为每个游戏服务器都必须读写db，当游戏变得越来越大时，我要如何使它的规模变得合适呢?我能想到的最好的解决方案是将数据库保存在一个服务器上，这将是非常强大的。

I understand that this could be easy if all game servers didn't have to "share" their state but this is primarily thought so that I can scale quickly in case of a sudden spike of activity.

我知道，如果所有的游戏服务器都不需要“共享”它们的状态，这可能会很容易，但这主要是为了在活动突然激增时，我可以快速扩展。

(There will be different "global" game servers like A, B, C... but each of those global game servers should be, behind the scenes, composed of 1-X docker containers running the "real" game server so that the "global" game server is only a concept)

(将会有不同的“全球”游戏服务器，如A、B、C……但在幕后，这些全局游戏服务器应该由运行“真实”游戏服务器的1-X docker容器组成，因此“全局”游戏服务器只是一个概念)

3 个解决方案

#1

The problem you state is too generic and it's difficult to give a concrete response. However let me be reckless and give you some general-purpose scaling advices:

你说的问题太笼统，很难给出具体的回答。不过，让我冒昧地给你一些通用的标度建议:

Remove counters from databases. Instead primary keys that are auto-incremented IDs, try to assign random UUIDs.

从数据库删除计数器。相反，主键是自动递增的id，尝试分配随机的uuid。
Change data that must be validated against a central point by data that is self contained. For example, for authentication, instead of having the User Credentials in a DB, use JSON Web Tokens that can be verified by any host.

更改必须由自包含的数据对中心点进行验证的数据。例如，对于身份验证，使用任何主机都可以验证的JSON Web令牌，而不是使用DB中的用户凭证。
Use techniques such as Consistent Hashing to balance the load without need of load balancers. Of course use hashing functions that distribute well, to avoid/minimize collisions.

使用诸如一致哈希之类的技术来平衡负载，而不需要负载平衡器。当然，要使用分布良好的散列函数，以避免/最小化冲突。

The above advices are basically about changing the design to migrate from stateful to stateless in as much as aspects as you can. If you anyway need to provide stateful parts, try to guess which entities will have more chance to share stateful data and allocate them in the same (or nearly server). For example, if there are cities in your game, try to allocate in the same server the users that are in the same city, since they are more willing to interact between them (and share stateful data) than users that are in different cities.

上面的建议基本上是关于在尽可能多的方面将设计从有状态迁移到无状态。如果您需要提供有状态部分，请尝试猜测哪些实体将有更多机会共享有状态数据并将它们分配到相同的(或几乎是服务器)中。例如，如果您的游戏中有城市，请尝试在同一服务器中分配位于同一城市的用户，因为他们比位于不同城市的用户更愿意在它们之间进行交互(并共享有状态数据)。

Of course if the city is too big and it's very crowded, you will probably need to partition the city in more servers to avoid overloading the server.

当然，如果城市太大、太拥挤，您可能需要将城市划分为更多的服务器，以避免服务器超载。

#2

Your question is too broad and a general scaling problem as others have mentioned. It'd have been helpful if you'd stated more clearly what your system requirements are.

你的问题太宽泛了，像其他人提到的那样是一个普遍的尺度问题。如果您能更清楚地说明您的系统需求是什么，那将会很有帮助。

If it has to be real-time, then you can choose Redis as your main DB but then you'd need slaves (for replication) and you would not be able to scale automatically as you go*, since Redis doesn't support that. I assume that's not a good option when you're working with games (Sudden spikes are probable)

如果它必须是实时的，那么您可以选择Redis作为您的主DB，但是您将需要从服务器(用于复制)，并且您将无法在运行时自动伸缩，因为Redis不支持这一点。我认为这不是一个好的选择当你在玩游戏的时候(突然的尖峰是可能的)

*there seems to be some managed solutions, you need to check them out

似乎有一些管理的解决方案，你需要检查一下

If it can be near real-time, using Apache Kafka can prove to be useful.

如果可以接近实时，那么使用Apache Kafka可能会很有用。

There's also a highly scalable DB which has everything you need called CockroachDB (I'm a contributor, yay!) but you need to run tests to see if it meets your latency requirements.

还有一个高度可伸缩的DB，它包含所有您需要的东西，称为CockroachDB(我是一个贡献者，耶!)，但是您需要运行测试，看看它是否满足您的延迟需求。

Overall, going with a very powerful server is a bad choice, since there's a ceiling and it'd cost you more to scale vertically.

总的来说，使用一个非常强大的服务器是一个糟糕的选择，因为有一个上限，垂直扩展的成本会更高。

#3

There's a great benefit in scaling horizontally such an application. I'll try to write down some ideas.

水平扩展这样的应用程序有很大的好处。我会试着写下一些想法。

Option 1 (stateful):

选项1(状态):

When planning stateful applications you need to take care about synchronisation of the state (via PubSub, Network Broadcasting or something else) and be aware that every synchronisation will take time to occur (when not blocking each operation). If this is ok for you, lets go ahead.

在规划有状态应用程序时，您需要注意状态的同步(通过PubSub、网络广播或其他东西)，并注意到每个同步都需要时间(当不阻塞每个操作时)。如果你觉得这样可以，我们继续。

Let's say you have 80k operations per second on your whole cluster. That means that every process need to synchronise 80k state changes per second. This will be your bottleneck. Handling 80k changes per second is quiet a big challenge for a Node.js application (because it's single threaded and therefore blocking).

假设在整个集群中每秒有80k个操作。这意味着每个进程都需要每秒同步80k的状态变化。这将成为您的瓶颈。处理每秒80k的更改对于节点来说是一个很大的挑战。js应用程序(因为它是单线程的，因此是阻塞的)。

At the end you'll need to provision precisely the maximum amount of changes you want to be able to sync and perform some tests with different programming languages. The overhead of synchronising needs to be added to the general work load of the application. It could be beneficial to use some multithreaded language like C, Java/Scala or Go.

最后，您需要精确地提供您希望能够同步和使用不同编程语言执行一些测试的最大更改量。同步的开销需要添加到应用程序的一般工作负载中。使用诸如C、Java/Scala或Go之类的多线程语言可能会有好处。

Option 2 (stateful with routing):*

选项2(有状态的路由):*

In some cases it's feasible to implement a different kind of scaling. When for example your application can be broken down into areas of a map, you could start with one app replication which holds the full map and when it scales up, it shares the map in a proportional way. You'll need to implement some routing between the application servers, for example to change the state in city A of world B => call server xyz. This could be done automatically but downscaling will be a challenge.

在某些情况下，实现一种不同的扩展是可行的。例如，当你的应用程序可以被分割成地图的区域时，你可以从一个应用程序复制开始，这个应用程序包含完整的地图，当它扩展时，它以比例的方式共享地图。您将需要在应用程序服务器之间实现一些路由，例如更改world B =>调用服务器xyz的city A中的状态。这可以自动实现，但缩小规模将是一个挑战。

This solution requires more care and knowledge about the application and is not as fault tolerant as option 1 but it could scale endlessly.

这个解决方案需要对应用程序有更多的关心和了解，并且不像选项1那样容错，但是它可以无限扩展。

Option 3 (stateless):

选项3(无状态的):

Move the state to some other application and solve the problem elsewhere (like Redis, Etcd, ...)

将状态转移到其他应用程序并在其他地方解决问题(如Redis、Etcd、…)

#1