关于NetCDF与HDF5存储科学数据的意见?

时间:2023-01-03 16:55:16

Anyone out there have enough experience w/ NetCDF and HDF5 to give some pluses / minuses about them as a way of storing scientific data?

有没有足够的经验w / NetCDF和HDF5给出一些关于它们的优缺点作为存储科学数据的方式?

I've used HDF5 and would like to read/write via Java but the interface is essentially a wrapper around the C libraries, which I have found confusing, so NetCDF seems intriguing but I know almost nothing about it.

我已经使用过HDF5并希望通过Java进行读/写,但是接口本质上是C库的包装器,我发现这让人感到困惑,所以NetCDF看起来很吸引人,但我几乎一无所知。

edit: my application is "only" for datalogging, so that I get a file that has a self-describing format. Important features for me are being able to add arbitrary metadata, having fast write access for appending to byte arrays, and having single-writer / multiple-reader concurrency (strongly preferred but not a must-have. NetCDF docs say they have SWMR but don't say whether they support any mechanism for ensuring that two writers can't open the same file at once with disastrous results). I like the hierarchical aspect of HDF5 (in particular I love the directed-acyclic-graph hierarchy, much more flexible than a "regular" filesystem-like hierarchy), am reading the NetCDF docs now... if it only allows one dataset per file then it probably won't work for me. :(

编辑:我的应用程序“仅”用于数据记录,因此我得到一个具有自描述格式的文件。对我来说,重要的功能是能够添加任意元数据,具有快速写入访问权限以附加到字节数组,以及具有单写入器/多读取器并发(强烈首选但不是必须的.​​NetCDF文档说他们有SWMR但是没有不能说它们是否支持任何机制来确保两个作者不能同时打开同一个文件并带来灾难性后果。我喜欢HDF5的层次结构(特别是我喜欢有向无环图层次结构,比“常规”文件系统类层次结构更灵活),现在正在阅读NetCDF文档...如果它只允许一个数据集文件然后它可能不适合我。 :(

update — looks like NetCDF-Java reads from netCDF-4 files but only writes from netCDF-3 files which don't support hierarchical groups. darn.

更新 - 看起来像NetCDF-Java从netCDF-4文件读取,但只从不支持分层组的netCDF-3文件写入。织补。

update 2009-Jul-14: I am starting to get really upset with HDF5 in Java. The library available isn't that great and it has some major stumbling blocks that have to do with Java's abstraction layers (compound data types). A great file format for C but looks like I just lose. >:(

更新2009年7月14日:我开始对Java中的HDF5感到非常不满。可用的库不是很好,它有一些主要的绊脚石,与Java的抽象层(复合数据类型)有关。 C的一个很好的文件格式,但看起来我只是输了。 > :(

7 个解决方案

#1


I strongly suggest you HDF5 instead of NetCDF. NetCDF is flat, and it gets very dirty after a while if you are not able to classify stuff. Of course classification is also a matter of debate, but at least you have this flexibility.

我强烈建议您使用HDF5而不是NetCDF。 NetCDF是平的,如果你不能对东西进行分类,它会在一段时间后变得非常脏。当然分类也是一个争论的问题,但至少你有这种灵活性。

We performed an accurate evaluation of HDF5 vs. NetCDF when I wrote Q5Cost, and the final result was for HDF5 hands down.

当我写Q5Cost时,我们对HDF5与NetCDF进行了准确的评估,最终的结果是HDF5。

#2


I'll have to admit using HDF5 is very much easier in the long run. It's not hard to get simple data structures into NetCDF format, but manipulating them down the road is kind of a pain.

从长远来看,我不得不承认使用HDF5非常容易。将简单的数据结构转换为NetCDF格式并不难,但是在未来操纵它们是一种痛苦。

The "H" in HDF5 stands for "heirarchical", which translated (for me anyway) into a REALLY easy way to manipulate data, by just moving nodes around and referencing nodes from other places.

HDF5中的“H”代表“heirarchical”,它通过仅移动节点并从其他位置引用节点,将(无论如何)转换为操作数据的简单方法。

Can I ask what kind of project this is? I use these both for a lot of HPC scientific modeling tasks. Can I assume you're doing the same? If so, the trend I'm seeing is people moving to HDF5, but that might be different in your particular domain.

我可以问这是一个什么样的项目?我将它们用于许多HPC科学建模任务。我可以假设你也这样做吗?如果是这样,我看到的趋势是人们转向HDF5,但在您的特定领域可能会有所不同。

However you end up going, best of luck!

然而,你最终会去,祝你好运!

#3


NetCDF, starting with version 4.0 (2008) can read and write most HDF5 files, and provides access to the hierarchical features of HDF5 via the enhanced data model.

从版本4.0(2008)开始,NetCDF可以读取和写入大多数HDF5文件,并通过增强型数据模型提供对HDF5的分层功能的访问。

HDF5 is extremely feature-rich, and has some great performance features.

HDF5功能非常丰富,并具有一些出色的性能。

NetCDF has a simpler API, and a much wider tool base. There are many tools that handle netCDF data.

NetCDF具有更简单的API和更广泛的工具库。有许多工具可以处理netCDF数据。

#4


I know this is an older post, and the original poster has indicated they've moved on, but for anyone that ends up here...the netCDF-Java library (as of 4.3.13) has netCDF-4 write support via the netCDF C library. It's still in beta, but it does work and feedback is certainly appreciated!

我知道这是一篇较旧的帖子,原始海报已经表明他们已经开始了,但对于那些最终在这里的人来说...... netCDF-Java库(截至4.3.13)通过以下方式提供netCDF-4写入支持netCDF C库。它仍处于测试阶段,但确实有效,反馈肯定会受到赞赏!

Please see the netCDF-Java reference docs for more details.

有关更多详细信息,请参阅netCDF-Java参考文档。

#5


Try writing some small sample application in each, and compare the experience. If future scalability of your code to parallel execution (via MPI or the like) is important to you, I know that HDF has a parallel implementation, which people are constantly working to improve. I'm not sure about NetCDF.

尝试在每个中编写一些小样本应用程序,并比较经验。如果你的代码未来可扩展到并行执行(通过MPI等)对你很重要,我知道HDF有一个并行实现,人们不断努力改进。我不确定NetCDF。

Late edit: For NetCDF, there is now Parallel NetCDF from Argonne. It works quite well, and the development team is quite active in improving it further.

延迟编辑:对于NetCDF,现在有来自Argonne的Parallel NetCDF。它运作良好,开发团队非常积极地进一步改进它。

#6


1) Netcdf-4 C library is a layer on top of HDF-5 C library. The API is considered simpler than the HDF5 library, but in the end you have pretty much the same functionality. Netcdf does not support graphs, but HDF5 does. In fact, HDF does not prevent cycles in your graph i think.

1)Netcdf-4 C库是HDF-5 C库之上的一层。该API被认为比HDF5库更简单,但最终您具有几乎相同的功能。 Netcdf不支持图形,但HDF5支持图形。事实上,我认为HDF并不能阻止图表中的周期。

2) the HDF group has a Java API on top of HDF-5 C library.

2)HDF组在HDF-5 C库之上有一个Java API。

3) Unidata has Netcdf-Java library which is pure Java, but can only read HDF-5.

3)Unidata有Netcdf-Java库,它是纯Java,但只能读取HDF-5。

#7


NetCDF, which translates HDF5 into its own data model, looks and works great... until you find out that NetCDF doesn't support unsigned values! See also my question on how to detect unsigned values in existing HDF5 files using NetCDF.

NetCDF将HDF5转换为自己的数据模型,看起来效果很好......直到你发现NetCDF不支持无符号值!另请参阅我的问题,了解如何使用NetCDF检测现有HDF5文件中的无符号值。

Update: Actually, it turns out that although NetCDF-3 doesn't support signed values, NetCDF-4 supports signed values, even though the NetCDF API in Java for determining signedness is a little convoluted.

更新:实际上,虽然NetCDF-3不支持签名值,但NetCDF-4支持签名值,即使用于确定签名的Java中的NetCDF API有点复杂。

#1


I strongly suggest you HDF5 instead of NetCDF. NetCDF is flat, and it gets very dirty after a while if you are not able to classify stuff. Of course classification is also a matter of debate, but at least you have this flexibility.

我强烈建议您使用HDF5而不是NetCDF。 NetCDF是平的,如果你不能对东西进行分类,它会在一段时间后变得非常脏。当然分类也是一个争论的问题,但至少你有这种灵活性。

We performed an accurate evaluation of HDF5 vs. NetCDF when I wrote Q5Cost, and the final result was for HDF5 hands down.

当我写Q5Cost时,我们对HDF5与NetCDF进行了准确的评估,最终的结果是HDF5。

#2


I'll have to admit using HDF5 is very much easier in the long run. It's not hard to get simple data structures into NetCDF format, but manipulating them down the road is kind of a pain.

从长远来看,我不得不承认使用HDF5非常容易。将简单的数据结构转换为NetCDF格式并不难,但是在未来操纵它们是一种痛苦。

The "H" in HDF5 stands for "heirarchical", which translated (for me anyway) into a REALLY easy way to manipulate data, by just moving nodes around and referencing nodes from other places.

HDF5中的“H”代表“heirarchical”,它通过仅移动节点并从其他位置引用节点,将(无论如何)转换为操作数据的简单方法。

Can I ask what kind of project this is? I use these both for a lot of HPC scientific modeling tasks. Can I assume you're doing the same? If so, the trend I'm seeing is people moving to HDF5, but that might be different in your particular domain.

我可以问这是一个什么样的项目?我将它们用于许多HPC科学建模任务。我可以假设你也这样做吗?如果是这样,我看到的趋势是人们转向HDF5,但在您的特定领域可能会有所不同。

However you end up going, best of luck!

然而,你最终会去,祝你好运!

#3


NetCDF, starting with version 4.0 (2008) can read and write most HDF5 files, and provides access to the hierarchical features of HDF5 via the enhanced data model.

从版本4.0(2008)开始,NetCDF可以读取和写入大多数HDF5文件,并通过增强型数据模型提供对HDF5的分层功能的访问。

HDF5 is extremely feature-rich, and has some great performance features.

HDF5功能非常丰富,并具有一些出色的性能。

NetCDF has a simpler API, and a much wider tool base. There are many tools that handle netCDF data.

NetCDF具有更简单的API和更广泛的工具库。有许多工具可以处理netCDF数据。

#4


I know this is an older post, and the original poster has indicated they've moved on, but for anyone that ends up here...the netCDF-Java library (as of 4.3.13) has netCDF-4 write support via the netCDF C library. It's still in beta, but it does work and feedback is certainly appreciated!

我知道这是一篇较旧的帖子,原始海报已经表明他们已经开始了,但对于那些最终在这里的人来说...... netCDF-Java库(截至4.3.13)通过以下方式提供netCDF-4写入支持netCDF C库。它仍处于测试阶段,但确实有效,反馈肯定会受到赞赏!

Please see the netCDF-Java reference docs for more details.

有关更多详细信息,请参阅netCDF-Java参考文档。

#5


Try writing some small sample application in each, and compare the experience. If future scalability of your code to parallel execution (via MPI or the like) is important to you, I know that HDF has a parallel implementation, which people are constantly working to improve. I'm not sure about NetCDF.

尝试在每个中编写一些小样本应用程序,并比较经验。如果你的代码未来可扩展到并行执行(通过MPI等)对你很重要,我知道HDF有一个并行实现,人们不断努力改进。我不确定NetCDF。

Late edit: For NetCDF, there is now Parallel NetCDF from Argonne. It works quite well, and the development team is quite active in improving it further.

延迟编辑:对于NetCDF,现在有来自Argonne的Parallel NetCDF。它运作良好,开发团队非常积极地进一步改进它。

#6


1) Netcdf-4 C library is a layer on top of HDF-5 C library. The API is considered simpler than the HDF5 library, but in the end you have pretty much the same functionality. Netcdf does not support graphs, but HDF5 does. In fact, HDF does not prevent cycles in your graph i think.

1)Netcdf-4 C库是HDF-5 C库之上的一层。该API被认为比HDF5库更简单,但最终您具有几乎相同的功能。 Netcdf不支持图形,但HDF5支持图形。事实上,我认为HDF并不能阻止图表中的周期。

2) the HDF group has a Java API on top of HDF-5 C library.

2)HDF组在HDF-5 C库之上有一个Java API。

3) Unidata has Netcdf-Java library which is pure Java, but can only read HDF-5.

3)Unidata有Netcdf-Java库,它是纯Java,但只能读取HDF-5。

#7


NetCDF, which translates HDF5 into its own data model, looks and works great... until you find out that NetCDF doesn't support unsigned values! See also my question on how to detect unsigned values in existing HDF5 files using NetCDF.

NetCDF将HDF5转换为自己的数据模型,看起来效果很好......直到你发现NetCDF不支持无符号值!另请参阅我的问题,了解如何使用NetCDF检测现有HDF5文件中的无符号值。

Update: Actually, it turns out that although NetCDF-3 doesn't support signed values, NetCDF-4 supports signed values, even though the NetCDF API in Java for determining signedness is a little convoluted.

更新:实际上,虽然NetCDF-3不支持签名值,但NetCDF-4支持签名值,即使用于确定签名的Java中的NetCDF API有点复杂。