如何存储可查询的序列化数据?

I need to extract data from an incoming message that could be in any format. The extracted data to store is also dependent upon the format, i.e. format A could extract field X, Y, Z, but format B could extract field A, B, C. I also need to view Message B by searching for field C within the message.

我需要从任何格式的传入消息中提取数据。要存储的提取数据也依赖于格式，即格式A可以提取字段X、Y、Z，而格式B可以提取字段A、B、C。

Right now I'm configuring and storing a the extraction strategy (XSLT) and executing it at runtime when it's related format is encountered, but I'm storing the extracted data in an Oracle database as an XmlType column. Oracle seems to have pretty lax development/support for XmlType as it requires an old jar that forces you to use a pretty old DOM DocumentBuilderFactory impl (looks like Java 1.4 code), which collides with Spring 3, and doesn't play very nicely with Hibernate. The XML queries are slow and non-intuitive as well.

现在我正在配置和存储提取策略(XSLT)，并在遇到相关格式时在运行时执行它，但是我将提取的数据作为XmlType列存储在Oracle数据库中。Oracle对XmlType的开发/支持似乎相当松散，因为它需要一个旧的jar，这迫使您使用一个非常旧的DOM DocumentBuilderFactory impl(看起来像Java 1.4代码)，它与Spring 3冲突，与Hibernate的性能不太好。XML查询既慢又不直观。

I'm concluding that Oracle with XmlType isn't a very good way to store the extracted data, so my question is, what is the best way to store the serialized/queryable data?

我的结论是，使用XmlType的Oracle不是存储提取数据的好方法，所以我的问题是，存储序列化/可查询数据的最佳方式是什么?

NoSQL (Cassandra, CouchDB, MongoDB, etc.)?
NoSQL (Cassandra、CouchDB、MongoDB等)?
A JCR like JackRabbit?
一个JCR喜欢长耳大野兔吗?
A blob with manual de/serialization?
一个具有手动去/序列化的blob ?
Another Oracle solution?
另一个甲骨文解决方案吗?
Something else??
别的东西? ?

2 个解决方案

#1

One alterative that you haven't listed is using an XML Database. (Notice that Oracle is one of the ten or so XML database products.)

您没有列出的一个替代方法是使用XML数据库。(请注意，Oracle是大约10个XML数据库产品之一。)

(Obviously, a blob type won't allow querying "inside" the persisted XML objects unless you read each blob instance into memory and do the querying there; e.g. using XSLT.)

(显然，blob类型不允许在“内部”查询持久化的XML对象，除非您将每个blob实例读入内存并在内存中进行查询;例如,使用XSLT)。

#2

I have had great success in storing complex xml objects in PostgreSQL. Together with the functional index features, you can even create indexes on node values of the stored xml files, and use those indexes to do very fast lookups using index scans without having to reparse the XML file.

我在用PostgreSQL存储复杂的xml对象方面取得了巨大的成功。与函数索引特性一起，您甚至可以在存储的xml文件的节点值上创建索引，并使用这些索引使用索引扫描进行快速查找，而不必重新解析xml文件。

This however will only work if you know your query patterns, arbitrary xpath queries will be slow also.

但是，只有当您知道查询模式时，这才会工作，任意的xpath查询也会很慢。

Example (untested, contains syntax errors for sure):

示例(未经测试，肯定包含语法错误):

Create a simple table:

创建一个简单的表:

create table test123 (
    int serial primary key,
    myxml text
)

Now lets assume you have xml documents like:

现在假设您有xml文档，比如:

<test>
    <name>Peter</name>
    <info>Peter is a <i>very</i> good cook</info>
</test>

Now create a function index:

现在创建一个函数索引:

create index idx_test123_name on table123 using xpath(xml,"/test/name");

Now do you fast xml lookups:

现在，您是否快速查询xml:

SELECT xml FROM test123 WHERE xpath(xml,"/test/name") = 'Peter';

You should also consider creating an index using text_pattern_ops, so you can have fast prefix lookups like:

您还应该考虑使用text_pattern_ops创建索引，这样您可以有快速的前缀查找:

SELECT xml FROM test123 WHERE xpath(xml,"/test/name") like 'Pe%';

#1