如何提高基于MQ的批处理应用程序的性能？

I have an application where messages keep coming at a rate of 70K XMLs per hour. We consume these XML messages and store it into an intermediate queue. The intermediate queue is created because we need to meet SLA of consuming all the messages with 24 hours. We are able to consume and load the XMLS into the internal queue within 24 hours. After loading it to internal queue, we process the XMLS (parse, apply very few transformation, perform very few validations) and store the data to a heavily normalized data model. I know that the datamodel can have a huge impact on performance, unfortunately, we have no control over the datamodel. Currently, we take 3.5 minutes to process 2K messages, which is unacceptable. We want to bring it down to 1 minute for 2K messages. Here is what we have done so far:

我有一个应用程序，其中消息以每小时70K XML的速率发送。我们使用这些XML消息并将其存储到中间队列中。创建中间队列是因为我们需要满足24小时消耗所有消息的SLA。我们能够在24小时内使用XMLS并将其加载到内部队列中。在将其加载到内部队列之后，我们处理XMLS（解析，应用极少数转换，执行非常少的验证）并将数据存储到严格规范化的数据模型中。我知道数据模型会对性能产生巨大影响，遗憾的是，我们无法控制数据模型。目前，我们需要3.5分钟来处理2K消息，这是不可接受的。对于2K消息，我们希望将其降低到1分钟。以下是我们迄今为止所做的工作：

1) Applied indexes wherever applicable.
2) Use XMLBeans for parsing the XMLs (size of each XML is not very huge)
3) Removed all unnecessary validations, transformatios, etc.

1）适用时应用索引。 2）使用XMLBeans来解析XML（每个XML的大小不是很大）3）删除了所有不必要的验证，transformatios等。

The application runs on:
Operating system: RHEL 5.4 64 bit
Platform: JDK 1.6.0_17, 64 bit
Database: Oracle 11g R2 64 bit (2 node cluster)
External MQ: IBM Queue
Internal temporary storage MQ: JBoss MQ
Application Server: Jboss 5.1.0.GA (EAP Version)

应用程序运行于：操作系统：RHEL 5.4 64位平台：JDK 1.6.0_17,64位数据库：Oracle 11g R2 64位（2节点群集）外部MQ：IBM Queue内部临时存储MQ：JBoss MQ Application Server：Jboss 5.1 .0.GA（EAP版）

The order in which we consume and process the XML messages is very important and so we cannot do a parallel processing.

我们使用和处理XML消息的顺序非常重要，因此我们无法进行并行处理。

Is there anything else we can do to improve performance?

我们还能做些什么来提高性能吗？

2 个解决方案

#1

WebSphere MQ, even on a small server, can unload messages MUCH faster than the rate you describe. The Windows Performance Report for WMQ V7 tested at more than 2,200 2k persistent round trips (one request and one reply) per second over client channels. That's more than 4,000 messages per second.

即使在小型服务器上，WebSphere MQ也可以比您描述的速率更快地卸载消息。 WMQ V7的Windows性能报告通过客户端通道每秒超过2,200次2k持续往返（一次请求和一次回复）测试。那是每秒超过4,000条消息。

The bottleneck in your case would seem to be the latency of processing messages and the dependency on processing the messages in a particular order. The option that could give you the MOST performance boost would be to eliminate the order dependency. When I worked at a bank we had a system that posted transactions in the exact order they arrived and everyone said this requirement was mandatory. However, we eventually revised the system to perform a memo-post during the day and then repost in a later step. The memo-posting occurred in any order and supported parallelism, failover and all the other benefits of multi-instance processing. The final post applied the transactions in logical order (and in fact in an order that was most favorable to the customer) once they were all in the DB. Sequence dependencies lock you into a singleton model and are a worst-case requirement for asynch messaging. Eliminate them if at all possible.

您的案例中的瓶颈似乎是处理消息的延迟以及以特定顺序处理消息的依赖性。可以为您提供最佳性能提升的选项是消除顺序依赖性。当我在一家银行工作时，我们有一个系统按照他们到达的确切顺序发布交易，每个人都说这个要求是强制性的。但是，我们最终修改了系统，以便在白天执行备忘录，然后在稍后的步骤中重新发布。备忘录发布以任何顺序发生，并支持并行性，故障转移和多实例处理的所有其他好处。最后的帖子在逻辑顺序中应用了事务（实际上是以对客户最有利的顺序），一旦它们都在数据库中。序列依赖关系将您锁定为单例模型，并且是异步消息传递的最坏情况要求。尽可能消除它们。

The other area for improvement will be in the parsing and processing of the messages. As long as you are stuck with sequence dependencies, this is the best bet for improving performance.

另一个需要改进的方面是解析和处理消息。只要您遇到顺序依赖，这是提高性能的最佳选择。

Finally, you always have the option to throw money at the problem in the form of more memory, CPU, faster disk I/O and so forth. Essentially this is addressing software architecture with horsepower and is never the best solution but often it buys you enough time to address the root cause.

最后，您总是可以选择以更多内存，CPU，更快的磁盘I / O等形式为问题投入资金。从本质上讲，这是针对具有强大功能的软件架构，并且永远不是最好的解决方案，但通常它会为您提供足够的时间来解决根本原因。

#2

Some suggestions outside of message delivery tuning since it appears this is not your [primary] bottleneck:

消息传递调优之外的一些建议，因为它看起来不是你的[主要]瓶颈：

You mentioned you are storing data into a highly normalized database. This invariably means one or more reference data or PK lookups which creates several additional trips to the database to fetch this data. To avoid or reduce this, create a local cache with all your reference data and update the cache as you go. In memory lookups will be significantly faster than a trip to the DB.
您提到您正在将数据存储到高度规范化的数据库中。这总是意味着一个或多个参考数据或PK查找，它们创建了几个额外的数据库访问以获取此数据。要避免或减少此问题，请使用所有参考数据创建本地缓存，并随时更新缓存。在内存中查找将明显快于数据库之旅。
If you feel you have insufficient RAM to cache all your decodes and reference data, shoot for a disk based cache (e.g. EHCache which will do RAM, Disk or Overflow) or a lightweight local DB like HyperSonic or H2 which will still give you better lookup times than a trip to Oracle (unless you're on the same host, and even then....)
如果您觉得没有足够的RAM来缓存所有解码和参考数据，请拍摄基于磁盘的缓存（例如，将执行RAM，磁盘或溢出的EHCache）或者像HyperSonic或H2这样的轻量级本地数据库，它仍然可以让您更好地查找时间而不是去甲骨文之旅（除非你在同一台主机上，即便如此......）
Ultimately, if each message requires many round trips to the DB, you may benefit from migrating the processing of the message to the DB itself, where you can implement the process in PL/SQL or Java.
最终，如果每条消息需要多次往返数据库，您可以将消息处理迁移到数据库本身，您可以在其中实现PL / SQL或Java中的过程。
If your write to the database for one message processed involves multiple inserts/updates, make sure to use prepared statement batching. This will send multiple inserts/updates in one call to the DB.
如果为处理的一条消息写入数据库涉及多次插入/更新，请确保使用预准备语句批处理。这将在一次调用DB中发送多个插入/更新。
Speaking of prepared statements, make sure your JBoss DataSource configuration for Oracle has prepared-statement-cache-size set to some number high enough to handle all your prepared statements created during processing (and not the default which is zero, or no caching).
说到预准备语句，请确保Oracle的JBoss DataSource配置已将prepare-statement-cache-size设置为足够高的某个数字，以处理在处理期间创建的所有预准备语句（而不是默认值为零，或者没有缓存）。
The XML parser you are using may be imposing more overhead than is necessary, even (or especially) for small messages. If you are using JAXB, make sure you're not recreating the unmarshaller more than once (or more than necessary). Alternatively, try a Pull/Streaming parser. If you are using a DOM parser, the additional memory required may be causing a lot of garbage collection.
您正在使用的XML解析器可能会产生比必要更多的开销，甚至（尤其）对于小消息。如果您使用的是JAXB，请确保不要多次重新创建unmarshaller（或者超过必要时间）。或者，尝试Pull / Streaming解析器。如果您使用的是DOM解析器，则所需的额外内存可能会导致大量垃圾回收。
Silly thing, but worth mentioning, if you are executing a lot of logging for each message, that will be costing you time, so turn it off.
愚蠢的事情，但值得一提的是，如果你为每条消息执行大量的日志记录，这将耗费你的时间，所以关掉它。
Using JBoss MQ as an intermediary buffer is elegant but it is probably not the fastest way to store your messages for deferred processing since the persistence is more complex and generalized for all sorts of JMS message types. On that note, if JBoss MQ is persisting to Oracle anyways, then it seems improbable that a custom persistence procedure would not be faster. If JBoss MQ is storing to HyperSonic (as it does by default), you can still probably outperform the store of the JMS message with some custom code. This will also mean that you will need a new mechanism to pull the message back out of the DB for processing, but as with the JMS store, a custom process may outperform the more generalized procedure implemented by JBoss MQ.
使用JBoss MQ作为中间缓冲区是优雅的，但它可能不是存储消息以进行延迟处理的最快方法，因为持久性对于各种JMS消息类型来说更复杂和通用。在这方面，如果JBoss MQ仍然持久化到Oracle，那么自定义持久化过程似乎不太可能更快。如果JBoss MQ存储到HyperSonic（默认情况下是这样），您仍然可以通过一些自定义代码超越JMS消息的存储。这也意味着您需要一种新机制将消息从数据库中拉回来进行处理，但与JMS存储一样，自定义进程可能胜过JBoss MQ实现的更通用的过程。
Storing intermediary messages to the DB may also give more query flexibility to determine where messages do not have to be serially processed. (Perhaps, for example, messages originating from different clients do not need to be processed in sequence). Of course, you can also do this with JBoss MQ by placing the appropriate headers in the intermediary messages. This would allow you to parallelize by using different selectors in multiple different message listeners/processors.
将中间消息存储到DB还可以提供更多查询灵活性以确定不必串行处理消息的位置。（例如，可能不需要按顺序处理来自不同客户端的消息）。当然，您也可以通过在中间消息中放置适当的标头来对JBoss MQ执行此操作。这将允许您通过在多个不同的消息侦听器/处理器中使用不同的选择器来并行化。

One quick item on messaging.....

消息传递的一个快速项目.....

You did not mention if you were using message driven beans with WebSphere MQ, but if you are, there is a setting in the Inbound Configuration called pollingInterval which, to quote from the docs, means:

您没有提及是否使用带有WebSphere MQ的消息驱动Bean，但如果您使用，则在入站配置中有一个名为pollingInterval的设置，从文档引用，表示：

If each message listener within a session has no suitable message on its queue, this is the maximum interval, in milliseconds, that elapses before each message listener tries again to get a message from its queue. If it frequently happens that no suitable message is available for any of the message listeners in a session, consider increasing the value of this property. This property is relevant only if TRANSPORT has the value BIND or CLIENT.

如果会话中的每个消息侦听器在其队列中没有合适的消息，则这是在每个消息侦听器再次尝试从其队列中获取消息之前经过的最大间隔（以毫秒为单位）。如果经常发生会话中没有适当的消息可用于任何消息侦听器，请考虑增加此属性的值。仅当TRANSPORT具有值BIND或CLIENT时，此属性才相关。

The default pollingTime is 5000 ms. Your current message processing time is

默认的pollingTime是5000毫秒。您当前的消息处理时间是

(3.5 * 60 * 1000 / 2000)

（3.5 * 60 * 1000/2000）

= 105 ms per message.

=每条消息105毫秒。

If you introduce a 5000 ms pause here-and-there, that will seriously cut down on your throughput, so you might want to look into this by measuring the ongoing difference between the message enqueue time and the time that you receive the message in your JBoss message listener. The enqueue time can be determined from these message headers:

如果你在这里引入一个5000毫秒的暂停，这将严重降低你的吞吐量，所以你可能想要通过测量消息入队时间和你收到消息的时间之间的持续差异来研究这个问题。 JBoss消息监听器。可以从以下消息标题中确定入队时间：

JMS_IBM_PutDate
JMS_IBM_PutDate
JMS_IBM_PutTime
JMS_IBM_PutTime

All in all, your best bet is going to be to figure out how to parallelize.

总而言之，您最好的选择是找出如何并行化。

Good luck.

祝你好运。

//Nicholas

//尼古拉斯

#1