最佳实践 - 使用rails应用程序中的重要验证逻辑将XML解析为数据库

时间:2022-07-14 22:26:27

I'm receiving big (around 120MB each), nested xml files. The parsing itself is very fast, currently i'm using the Nokogiri:SAXParser which is way faster then a DOM based. I need to check back a lot of values in the database. (Should it be updated or not?) Also i keep database queries as low as possible (eager loading, pure sql selects) the performance loss is about 40x in comparision to parsing only. I can't use mass inserts due to the need of validation/check back existing records/a lot of association involved. The whole process is in a transaction which speeded up things around 1.5x . What approach would you take? I'm looking forward to any help! I'm not very skilled in the whole XML thing. Would XLST help me? Also i have a XSD file for the files which arrive me.

我收到大(大约120MB),嵌套的xml文件。解析本身非常快,目前我使用的是Nokogiri:SAXParser,它比基于DOM的方式更快。我需要检查数据库中的很多值。 (它应该更新还是不更新?)另外,我保持数据库查询尽可能低(急切加载,纯sql选择),与仅解析相比,性能损失约为40倍。由于需要验证/检查现有记录/涉及的许多关联,我不能使用大量插入。整个过程是在一个加速1.5倍左右的事务中。你会采取什么方法?我期待着任何帮助!我不是很熟悉整个XML的东西。 XLST会帮助我吗?此外,我有一个XSD文件的文件到达我。

Thanks in advance!

提前致谢!

1 个解决方案

#1


0  

I ended up with a rebuild of associations which now fit more into the third party data and I can use MASS-INSERTS. (watch out for the max_allowed_packet value!!!)I'm using the sax-machine gem. When most of the basic data is already in the database i can now process (including db stuff) a 120MB file in about 10 seconds. Which is totally fine. Feel free to ask.

我最终重建了协会,现在更适合第三方数据,我可以使用MASS-INSERTS。 (注意max_allowed_pa​​cket值!!!)我正在使用sax-machine gem。当大多数基本数据已经存在于数据库中时,我现在可以在大约10秒内处理(包括db stuff)一个120MB的文件。这完全没问题。随意问。

#1


0  

I ended up with a rebuild of associations which now fit more into the third party data and I can use MASS-INSERTS. (watch out for the max_allowed_packet value!!!)I'm using the sax-machine gem. When most of the basic data is already in the database i can now process (including db stuff) a 120MB file in about 10 seconds. Which is totally fine. Feel free to ask.

我最终重建了协会,现在更适合第三方数据,我可以使用MASS-INSERTS。 (注意max_allowed_pa​​cket值!!!)我正在使用sax-machine gem。当大多数基本数据已经存在于数据库中时,我现在可以在大约10秒内处理(包括db stuff)一个120MB的文件。这完全没问题。随意问。