发布/订阅大文件作为消息有效负载

We have an existing system that processes a lot of files on an ongoing basis. Roughly speaking, about 3 million files a day that can range in size from a few kilobytes to in excess of 50 MB. These files go through a few different stages of processing from the time they are received to when they are finished being consumed, depending on the path they take. Due to the content and format of these files, they can NOT be broken up into smaller chunks.

我们有一个现有的系统,可以持续处理大量文件。粗略地说,每天大约有300万个文件,大小范围从几千字节到超过50 MB。这些文件从接收到完成消费时经历了几个不同的处理阶段,具体取决于它们所采用的路径。由于这些文件的内容和格式,它们不能分解成更小的块。

Currently, the workflow these files move through is rigid and dictated by the code with fixed inputs and outputs (in many cases, where one subscriber becomes the publisher for a new set of files). This lack of flexibility is starting to cause us issues however so I'm looking at some kind of pub/sub solution for being able to handle new requirements.

目前,这些文件所经过的工作流程是严格的,并且由具有固定输入和输出的代码决定(在许多情况下,一个订户成为新文件集的发布者)。这种缺乏灵活性开始引起我们的问题,所以我正在寻找某种发布/子解决方案,以便能够处理新的需求。

Most traditional pub/sub solutions have the data within the actual payload, but the large potential file sizes exceed the limits of many messaging platforms. Furthermore, we have multiple platforms in play: files progress through both Linux and Windows tiers depending on their path.

大多数传统的pub / sub解决方案都具有实际有效负载内的数据,但是大的潜在文件大小超出了许多消息传递平台的限制。此外,我们还有多个平台:根据路径,文件在Linux和Windows层中都会进展。

Does anyone have any design and/or implementation recommendations with the following goals in mind?
1. Multiplatform for both pub and sub (Linux and Windows)
2. Persistent storage/store-and-forward support
3. Can handle large event payloads and appropriately cleans up once all subscribers have been serviced
4. Routing/workflow is done via configuration
5. Subscribers can subscribe to a filtered set of published events based on changing criteria (e.g. only give me files of a specific type)

有没有人有任何设计和/或实施建议,并考虑到以下目标? 1.发布和发布的多平台(Linux和Windows)2。持久存储/存储转发支持3.可以处理大型事件有效负载,并在所有订户都得到服务后进行适当清理4.路由/工作流程通过配置完成5.订阅者可以根据更改的条件订阅已过滤的已发布事件集(例如,仅提供特定类型的文件)

I've done a bunch of digging into a number of service bus and MQ implementations, but haven't quite been able to firm up enough of a design approach to properly evaluate what tools make the most sense. Thanks for any input.

我已经完成了大量的服务总线和MQ实现,但还没有足够的设计方法来正确评估哪些工具最有意义。感谢您的任何意见。

2 个解决方案

#1

A1. I developed similar system on my previous job. We didn't pass the multi-MB payload inside the message, instead we stored it on the file server, and only passed the UNC file name (the messaging was Java RMI, but pretty much anything will work).

A1。我在以前的工作中开发了类似的系统。我们没有在消息中传递多MB有效负载,而是将它存储在文件服务器上,并且只传递了UNC文件名(消息传递是Java RMI,但几乎任何东西都可以工作)。

A2. I recently started to use Windows Communication Foundation. Fortunately for me, I'm only supporting Windows, and I don't need such big messages. However the documentation says the protocol is platform-independent, and there's the option to pass huge chunks of data using its streaming message transfer feature.

A2。我最近开始使用Windows Communication Foundation。对我来说幸运的是,我只支持Windows,而且我不需要这么大的消息。然而,文档说协议是独立于平台的,并且可以选择使用其流式消息传输功能传递大量数据。

In both cases, I think you'll have to fulfill your #4 and #5 requirements in your own code.

在这两种情况下,我认为您必须在自己的代码中满足#4和#5要求。

#2

You may want to look into ActiveMQ if your clients are internal clients. ActiveMQ does support up to 2GB of data (I think) and also support blob messages. It guarantees delivery and processing (with transactions).

如果您的客户是内部客户,您可能需要查看ActiveMQ。 ActiveMQ确实支持最多2GB的数据(我认为)并且还支持blob消息。它保证交付和处理(与交易)。

Hope this helps.

希望这可以帮助。

#1