如何处理来自不同服务器的请求的多个数据库结果

I have cloud statistics (Structured data :: CSV) information; which i have to expose to administrator and user.

我有云统计(结构化数据:CSV)信息;我必须公开给管理员和用户。

But for scalability; data collection will be collected by multiple machines (perf monitor) which is connected with individual DBs.

但可伸缩性;数据收集将由连接到各个DBs的多台机器(perf monitor)进行收集。

Now Manager (Mgr) is responsible of multicasting the request to all perf monitor; to collect the overall stats data to satisfy single UI request.

现在经理(经理)负责将请求多播给所有perf监控器;收集总体状态数据以满足单个UI请求。

So questions are:

所以问题是:

1) How will i make the mutiple monitor datas to be sorted based on the client request at Mgr. Each monitor may give the result as per the client request; but still how to merge multiple machines datas through java? Means How to perform in memory sql aggregate/scalar (e.g. Groupby, orderby, avg) function on all the results retrieved from multiple clusters at MGR. How do i implement DB sql aggregate/scalar functionality in java side, any known APIs? I think what i need is Reduce part of mapreduce technique in hadoop.

1)如何根据Mgr的客户端请求对多组监控数据进行排序。每个监视器可以根据客户的要求给出结果;但是如何通过java合并多台机器数据呢?表示如何在内存中执行sql聚合/标量(例如Groupby、orderby、avg)函数，用于在MGR中从多个集群检索到的所有结果。如何在java端实现DB sql聚合/标量功能，以及已知的api ?我认为我需要的是在hadoop中减少mapreduce技术的一部分。

2) A request from UI (assume select count(*) from DB where Memory > 1000MB) have to be forwarded to multiple machines. Now how to send parallel requests to individual monitor and consume only when all the nodes are responded? Means how to wait User thread till consuming all the responses from perf monitors? How to trigger parallel REST request for single UI request on MGR.

2)来自UI的请求(假设从DB中选择count(*)，其中内存> 1000MB)必须转发到多台机器。现在，如何向单个监视器发送并行请求，并只在响应所有节点时使用这些请求?意味着如何等待用户线程直到使用来自perf监视器的所有响应?如何在MGR上触发单个UI请求的并行REST请求。

3) Do I have to authenticate UI user at both Mgr and Perf monitor?

3)是否需要对Mgr和Perf monitor的UI用户进行身份验证?

4) Are you thinking any drawback in this approach?

你认为这种方法有什么缺点吗?

Notes:

注:

1) I didn't go for NoSql because datas are structured and no joins are required.

1)我没有使用NoSql，因为数据是结构化的，不需要连接。

2) I didn't go for node.js since i am new for that and may take more time on developing it. Also i am not developing any concurrent critical where single threaded are best suited. Here only push/retrieve of data is done. No modification happening.

2)我没有去找node。因为我是新手，可能需要更多的时间来开发它。同样，我也没有开发任何适合单线程的并发临界。这里只执行数据的推/检索。发生任何修改。

3) I want individual DB for each monitor OR at-least two instances of DB's with multiple clusters for an instance to support faster accessing of real time BIG statistical data.

3)我希望每个监视器都有一个DB，或者至少有两个DB实例，一个实例有多个集群，以支持更快地访问实时大型统计数据。

5 个解决方案

#1

You want to scale your app, but you designed an inherent bottleneck. Namely: the Mgr.

你想要扩展你的应用，但是你设计了一个固有的瓶颈。即:经理

What I would do is that I would split the Mgr into at least two parts. Front-end and backend. The front end could simply be an aggregator and/or controller which collects all the requests from all the different UI servers, timestamps those requests and put them in a queue (RabbitMQ, Kafka, Redis, whatever) making a message with the UI session ID or something similar which uniquely identifies the source of request. Then you just have to wait until you get a response on the queue (with a different topic of course).

我要做的是将经理至少分成两部分。前端和后端。前端可以仅仅是一个聚合器和/或控制器收集来自所有不同的UI服务器的请求,这些请求时间戳,把它们放在一个队列(RabbitMQ、卡夫卡,复述,等等)与UI消息会话ID或类似的惟一地标识请求的来源。然后，您只需等待队列上的响应(当然是另一个主题)。

Then on your backend (the other side of the queue) you can set up as many nodes as your load requires and make them performing the same task. Namely: pull off requests from the queue and call those performance monitoring APIs as necessary. You can scale these backend nodes as much as you wish since they don't have any state, all the state which needs to be stored is already part of the messages in the queue which will be automagically persisted for you by Redis/Kafka/RabbitMQ or whatever else you choose.

然后在后端(队列的另一端)，可以根据负载的需要设置任意数量的节点，并让它们执行相同的任务。即:从队列中取出请求，并根据需要调用这些性能监视api。您可以任意扩展这些后端节点，因为它们没有任何状态，需要存储的所有状态都已经是队列中的消息的一部分，这些消息将通过Redis/Kafka/RabbitMQ或其他您选择的方式自动持久化。

You can also use Apache Storm or something similar to do this for you in the backend, since it was designed for exactly this kind of applications.

您也可以使用Apache Storm或类似的方法在后台执行，因为它是为这种应用程序设计的。

Apache Storm has also built-in merging capability exposed through the Trident API.

Apache Storm还内置了通过Trident API公开的合并功能。

Note on the authentication: you should authenticate the HTTP requests on the front-end side and then you will be all right. Just assign unique IDs (session IDs most probably) to the users connected to your mgr and use this internal ID when you forward your requests further to downstream servers.

请注意认证:您应该对前端端的HTTP请求进行身份验证，然后您就可以了。只需向连接到mgr的用户分配惟一的ID(最可能是会话ID)，并在将请求进一步转发到下游服务器时使用此内部ID。

Now how to send parallel requests to individual monitor and consume only when all the nodes are responded? Means how to wait User thread till consuming all the responses from perf monitors? How to trigger parallel REST request for single UI request on MGR.

现在，如何向单个监视器发送并行请求，并只在响应所有节点时使用这些请求?意味着如何等待用户线程直到使用来自perf监视器的所有响应?如何在MGR上触发单个UI请求的并行REST请求。

Well if you have so many questions regarding handling user connections and serving those clients with responses then I would suggest to pick up a book on the Java servlets API. You might want to read this one for example: Servlet & JSP: A Tutorial (A Tutorial series). It is a bit outdated but well written.

如果您对处理用户连接和为客户端提供响应有很多疑问，那么我建议您阅读一本关于Java servlets API的书。您可能想读一下这个例子:Servlet & JSP:教程(教程系列)。这有点过时，但写得很好。

But with all due respect, if you have so many questions on these quite fundamental topics, then it might be better to leave the architecture design to someone more experienced.

但是恕我冒昧，如果您对这些非常基本的主题有如此多的疑问，那么最好还是把架构设计留给更有经验的人。

#2

Don't reinvent the wheel, use some good existing BAM and Database monitoring tools, they have lot of built in dashboards and statistics, easy to connect with Java and work-flows.

不要重新发明*，使用一些好的现有的BAM和数据库监视工具，它们有很多内置的仪表板和统计信息，很容易与Java和工作流连接。

#3

But for scalability; data collection will be collected by multiple machines (perf monitor) which is connected with individual DBs.

但可伸缩性;数据收集将由连接到各个DBs的多台机器(perf monitor)进行收集。

Approximately what sort of scaling do you anticipate ... is it 100s of GB's Multiple Terra Bytes .... Reason is these days SQL Server and Oracle can handle really large volumes of data. Once data is collected in a central db its game over as far as searching and crunching are concerned.

大概是什么样的比例?它是100年代.... GB多个Terra的字节原因是近来SQL Server和Oracle能够处理大量的数据。一旦数据在*数据库中被收集，它的游戏就会涉及到搜索和运算。

Now Manager (Mgr) is responsible of multicasting the request to all perf monitor; to collect the overall stats data to satisfy single UI request.

现在经理(经理)负责将请求多播给所有perf监控器;收集总体状态数据以满足单个UI请求。

This will be a major task to write this and it will be really complex IMHO. That said Iam not an expert in this aspect.

这将是一个主要的任务写这将是非常复杂的IMHO。也就是说我不是这方面的专家。

#4

What I would do is to put a layer of Hazelcast or Infinispan or something like this in your Performance Monitor instead of the Hazelcast. The Performance monitor itself like a logic can be part of the DataGrid. Then the MySQL will work as a persistent storage of this data grid. In this sense you can have more then one Mysql and each mysql will just hold a portion of the data It will just work as extension ability to go beyond your maximum RAM. Overtime you scale your performance monitor you will also scale your persistent capabilities.

我要做的是在你的性能监视器上放一层Hazelcast或者Infinispan或者类似的东西，而不是Hazelcast。性能监视器本身就像一个逻辑可以是DataGrid的一部分。然后，MySQL将作为这个数据网格的持久存储。从这个意义上说，你可以有不止一个Mysql，每个Mysql都只保存一部分数据，作为扩展功能，它可以超过你的最大RAM。在你的性能监视器上，你也会扩展你的持续能力。

Young then Map Reduce or other distributed functions for aggregation can lead to massive amount of paralelism and ability to server significantly more requests. Also such architecture scales horizontal. At the end it should look something like this:

年轻时，Map Reduce或其他分布式函数的聚合会导致大量的paralelism和服务器的能力显著增加。同样，这样的架构是水平伸缩的。最后应该是这样的:

And just on another note to say that it is not necessary in general to have 1 MySQL for each hazelcast. That depends on what the goal is. I also kind of forgot the Manager from the diagram but things there are simple it can either work as a gateway to the Data Grid or alternatively it can be merged with the grid.

再强调一下，一般来说，每个hazelcast都不需要1个MySQL。这取决于目标是什么。我也有点忘记图中的管理器了，但是这里的东西很简单，它可以作为数据网格的网关，也可以与网格合并。

#5

Not sure if my answer would be useful for you since this question has been posted sometimes back.

我不确定我的答案是否对你有用，因为这个问题有时被贴出来。

I would like to answer it based on your question, problems in the current approach and proposed solution...

我想根据你的问题，当前方法中的问题和提出的解决方案来回答这个问题。

1) How will i make the mutiple monitor datas to be sorted based on the client request at Mgr. Each monitor may give the result as per the client request; but still how to merge multiple machines datas through java? Means How to perform in memory sql aggregate/scalar (e.g. Groupby, orderby, avg) function on all the results retrieved from multiple clusters at MGR. How do i implement DB sql aggregate/scalar functionality in java side, any known APIs? I think what i need is Reduce part of mapreduce technique in hadoop.

1)如何根据Mgr的客户端请求对多组监控数据进行排序。每个监视器可以根据客户的要求给出结果;但是如何通过java合并多台机器数据呢?表示如何在内存中执行sql聚合/标量(例如Groupby、orderby、avg)函数，用于在MGR中从多个集群检索到的所有结果。如何在java端实现DB sql聚合/标量功能，以及已知的api ?我认为我需要的是在hadoop中减少mapreduce技术的一部分。

Java provided in-build Java DB as part of Java distribution which is also available as Apache Derby database. This database can be used as in-memory SQL database. JavaDB & Apache Derby stores the data into disk. So you won't loose the data after restart. Check here http://www.oracle.com/technetwork/java/javadb/overview/index.html https://db.apache.org/derby/

Java提供了内置Java DB作为Java分发的一部分，也可以作为Apache Derby数据库使用。这个数据库可以用作内存中的SQL数据库。JavaDB和Apache Derby将数据存储到磁盘中。因此，在重新启动后不会丢失数据。检查这里http://www.oracle.com/technetwork/java/javadb/overview/index.html https://db.apache.org/derby/

For Map-Reduce simple Java collection based approached would work. I don't think you need any special Map-Reduce framework in this case. You should however consider Out Of Memory, Network bandwidth etc. when you read data from multiple sources

对于Map-Reduce简单的基于Java集合的方法是可行的。在这种情况下，我不认为您需要任何特殊的Map-Reduce框架。但是，当您从多个源读取数据时，应该考虑内存不足、网络带宽等问题

2) A request from UI (assume select count(*) from DB where Memory > 1000MB) have to be forwarded to multiple machines. Now how to send parallel requests to individual monitor and consume only when all the nodes are responded? Means how to wait User thread till consuming all the responses from perf monitors? How to trigger parallel REST request for single UI request on MGR.

2)来自UI的请求(假设从DB中选择count(*)，其中内存> 1000MB)必须转发到多台机器。现在，如何向单个监视器发送并行请求，并只在响应所有节点时使用这些请求?意味着如何等待用户线程直到使用来自perf监视器的所有响应?如何在MGR上触发单个UI请求的并行REST请求。

Ideally NodeJS kind of application are really best suite in this case where application get callback whenever there is a response of the HTTP call. However you can implement Observer Pattern like explained here How do I perform a JAVA callback between classes?

理想情况下，NodeJS类型的应用程序是最好的套件，在这种情况下，只要有HTTP调用的响应，应用程序就会得到回调。但是，您可以实现像这里解释的Observer模式，如何在类之间执行JAVA回调?

3) Do I have to authenticate UI user at both Mgr and Perf monitor?

3)是否需要对Mgr和Perf monitor的UI用户进行身份验证?

It should be based on your requirement

它应该基于您的需求

4) Are you thinking any drawback in this approach?

你认为这种方法有什么缺点吗?

There are several drawbacks with this approach

这种方法有几个缺点

Data should not be pulled on-demand from UI. At-least data should be available in the centralised database whenever there is a request to generate the data. Pulling data from various end-points is expensive.
不应该按需从UI提取数据。只要有生成数据的请求，就应该在*数据库中提供最少的数据。从不同端点提取数据是昂贵的。
Stats must be collected periodically to maintain history and reports must be generated based on the moving time window.
必须定期收集统计数据以维护历史记录，并且必须基于移动时间窗口生成报告。
JVM might go OutOfMemory if large data needs to be process. Proper handling is required.
如果需要处理大数据，JVM可能会转到OutOfMemory。正确处理是必需的。
Large data might get transferred over the network every time there is a new request. It might be for the same data again.
每当有新的请求时，大型数据可能通过网络传输。可能是同样的数据。

Notes:

注:

1) I didn't go for NoSql because datas are structured and no joins are required.

1)我没有使用NoSql，因为数据是结构化的，不需要连接。

No SQL doesn't mean there is not structure followed. Even NoSQL database is the best fit for such data where you don't update the records, transactions etc are not required.

没有SQL并不意味着没有遵循结构。即使NoSQL数据库最适合于不需要更新记录、事务等的数据。

2) I didn't go for node.js since i am new for that and may take more time on developing it. Also i am not developing any concurrent critical where single threaded are best suited. Here only push/retrieve of data is done. No modification happening.

2)我没有去找node。因为我是新手，可能需要更多的时间来开发它。同样，我也没有开发任何适合单线程的并发临界。这里只执行数据的推/检索。发生任何修改。

NodeJS won't be a good choice since it is single threaded. NodeJS should not be used when you have CPU intensive job to perform. Like yours.

NodeJS不是一个好的选择，因为它是单线程的。当需要执行CPU密集型工作时，不应该使用NodeJS。像你这样的。

3) I want individual DB for each monitor OR at-least two instances of DB's with multiple clusters for an instance to support faster accessing of real time BIG statistical data.

3)我希望每个监视器都有一个DB，或者至少有两个DB实例，一个实例有多个集群，以支持更快地访问实时大型统计数据。

**I would rather suggest you to either store data into any database which can horizontally scale, process the data either as and when it arrives or batch processing so that your user experience is good. **

**我建议您要么将数据存储到任何可以水平伸缩的数据库中，要么在数据到达时进行处理，要么进行批处理，以便用户体验良好。* *

#1

You want to scale your app, but you designed an inherent bottleneck. Namely: the Mgr.

你想要扩展你的应用，但是你设计了一个固有的瓶颈。即:经理

You can also use Apache Storm or something similar to do this for you in the backend, since it was designed for exactly this kind of applications.

您也可以使用Apache Storm或类似的方法在后台执行，因为它是为这种应用程序设计的。

Apache Storm has also built-in merging capability exposed through the Trident API.

Apache Storm还内置了通过Trident API公开的合并功能。

Now how to send parallel requests to individual monitor and consume only when all the nodes are responded? Means how to wait User thread till consuming all the responses from perf monitors? How to trigger parallel REST request for single UI request on MGR.

现在，如何向单个监视器发送并行请求，并只在响应所有节点时使用这些请求?意味着如何等待用户线程直到使用来自perf监视器的所有响应?如何在MGR上触发单个UI请求的并行REST请求。

But with all due respect, if you have so many questions on these quite fundamental topics, then it might be better to leave the architecture design to someone more experienced.

但是恕我冒昧，如果您对这些非常基本的主题有如此多的疑问，那么最好还是把架构设计留给更有经验的人。

#2

Don't reinvent the wheel, use some good existing BAM and Database monitoring tools, they have lot of built in dashboards and statistics, easy to connect with Java and work-flows.

不要重新发明*，使用一些好的现有的BAM和数据库监视工具，它们有很多内置的仪表板和统计信息，很容易与Java和工作流连接。

#3

But for scalability; data collection will be collected by multiple machines (perf monitor) which is connected with individual DBs.

但可伸缩性;数据收集将由连接到各个DBs的多台机器(perf monitor)进行收集。

Now Manager (Mgr) is responsible of multicasting the request to all perf monitor; to collect the overall stats data to satisfy single UI request.

现在经理(经理)负责将请求多播给所有perf监控器;收集总体状态数据以满足单个UI请求。

This will be a major task to write this and it will be really complex IMHO. That said Iam not an expert in this aspect.

这将是一个主要的任务写这将是非常复杂的IMHO。也就是说我不是这方面的专家。

#4

年轻时，Map Reduce或其他分布式函数的聚合会导致大量的paralelism和服务器的能力显著增加。同样，这样的架构是水平伸缩的。最后应该是这样的:

#5

Not sure if my answer would be useful for you since this question has been posted sometimes back.

我不确定我的答案是否对你有用，因为这个问题有时被贴出来。

I would like to answer it based on your question, problems in the current approach and proposed solution...

我想根据你的问题，当前方法中的问题和提出的解决方案来回答这个问题。

1) How will i make the mutiple monitor datas to be sorted based on the client request at Mgr. Each monitor may give the result as per the client request; but still how to merge multiple machines datas through java? Means How to perform in memory sql aggregate/scalar (e.g. Groupby, orderby, avg) function on all the results retrieved from multiple clusters at MGR. How do i implement DB sql aggregate/scalar functionality in java side, any known APIs? I think what i need is Reduce part of mapreduce technique in hadoop.

1)如何根据Mgr的客户端请求对多组监控数据进行排序。每个监视器可以根据客户的要求给出结果;但是如何通过java合并多台机器数据呢?表示如何在内存中执行sql聚合/标量(例如Groupby、orderby、avg)函数，用于在MGR中从多个集群检索到的所有结果。如何在java端实现DB sql聚合/标量功能，以及已知的api ?我认为我需要的是在hadoop中减少mapreduce技术的一部分。

2) A request from UI (assume select count(*) from DB where Memory > 1000MB) have to be forwarded to multiple machines. Now how to send parallel requests to individual monitor and consume only when all the nodes are responded? Means how to wait User thread till consuming all the responses from perf monitors? How to trigger parallel REST request for single UI request on MGR.

2)来自UI的请求(假设从DB中选择count(*)，其中内存> 1000MB)必须转发到多台机器。现在，如何向单个监视器发送并行请求，并只在响应所有节点时使用这些请求?意味着如何等待用户线程直到使用来自perf监视器的所有响应?如何在MGR上触发单个UI请求的并行REST请求。

3) Do I have to authenticate UI user at both Mgr and Perf monitor?

3)是否需要对Mgr和Perf monitor的UI用户进行身份验证?

It should be based on your requirement

它应该基于您的需求

4) Are you thinking any drawback in this approach?

你认为这种方法有什么缺点吗?

There are several drawbacks with this approach

这种方法有几个缺点

Data should not be pulled on-demand from UI. At-least data should be available in the centralised database whenever there is a request to generate the data. Pulling data from various end-points is expensive.
不应该按需从UI提取数据。只要有生成数据的请求，就应该在*数据库中提供最少的数据。从不同端点提取数据是昂贵的。
Stats must be collected periodically to maintain history and reports must be generated based on the moving time window.
必须定期收集统计数据以维护历史记录，并且必须基于移动时间窗口生成报告。
JVM might go OutOfMemory if large data needs to be process. Proper handling is required.
如果需要处理大数据，JVM可能会转到OutOfMemory。正确处理是必需的。
Large data might get transferred over the network every time there is a new request. It might be for the same data again.
每当有新的请求时，大型数据可能通过网络传输。可能是同样的数据。

Notes:

注:

1) I didn't go for NoSql because datas are structured and no joins are required.

1)我没有使用NoSql，因为数据是结构化的，不需要连接。

No SQL doesn't mean there is not structure followed. Even NoSQL database is the best fit for such data where you don't update the records, transactions etc are not required.

没有SQL并不意味着没有遵循结构。即使NoSQL数据库最适合于不需要更新记录、事务等的数据。

2) I didn't go for node.js since i am new for that and may take more time on developing it. Also i am not developing any concurrent critical where single threaded are best suited. Here only push/retrieve of data is done. No modification happening.

2)我没有去找node。因为我是新手，可能需要更多的时间来开发它。同样，我也没有开发任何适合单线程的并发临界。这里只执行数据的推/检索。发生任何修改。

NodeJS won't be a good choice since it is single threaded. NodeJS should not be used when you have CPU intensive job to perform. Like yours.

NodeJS不是一个好的选择，因为它是单线程的。当需要执行CPU密集型工作时，不应该使用NodeJS。像你这样的。

3) I want individual DB for each monitor OR at-least two instances of DB's with multiple clusters for an instance to support faster accessing of real time BIG statistical data.

3)我希望每个监视器都有一个DB，或者至少有两个DB实例，一个实例有多个集群，以支持更快地访问实时大型统计数据。

**我建议您要么将数据存储到任何可以水平伸缩的数据库中，要么在数据到达时进行处理，要么进行批处理，以便用户体验良好。* *

秒客网

如何处理来自不同服务器的请求的多个数据库结果

5 个解决方案

#1

#2

#3

#4

#5

#1

#2

#3

#4

#5

相关文章