用Haskell写二进制数据以便由C读取?

时间:2022-04-10 16:08:38

I have a file containing a [Double] serialized by Data.Binary that I'd like to read with C. That is, I want to write a C program that reads that data into memory as double[]. I'm planning on writing a Haskell program to deserialize the data file and then write the binary data into a new, simpler file that I can just directly read into C, but I'm not sure how to write out just the raw binary data (e.g. 8 bytes for a double).

我有一个包含由Data.Binary序列化的[Double]的文件,我想用C读。也就是说,我想编写一个C程序,将该数据作为double []读入内存。我正在计划编写一个Haskell程序来反序列化数据文件,然后将二进制数据写入一个新的,更简单的文件,我可以直接读入C,但我不知道如何写出原始二进制数据(例如,双倍的8个字节)。

2 个解决方案

#1


3  

Using Data.Binary to serialize Double or Float values is not great for portability. The Binary instances serialize the values in the form obtained by decodeFloat, i.e. as a mantissa and an exponent. The mantissa is serialized as an Integer. Parsing that is inconvenient. Much better, as has already suggested by ehird, is using a variant that serializes them as the bit-pattern of the IEEE-754 representation, as offered by cereal-ieee754 - as ehird reminded me, that has been merged (minus some conversion between floating point and word types) into cereal - or the already mentioned data-binary-ieee754. Another option is serializing them as strings via show. That has the advantage of avoiding any endianness problems.

使用Data.Binary来序列化Double或Float值对于可移植性来说并不是很好。二进制实例将由decodeFloat获得的形式的值序列化,即作为尾数和指数。尾数被序列化为整数。解析不方便。更好的是,正如ehird已经建议的那样,正在使用一种变体,将它们序列化为IEEE-754表示的位模式,由grain-ieee754提供 - 正如ehird提醒我的那样,已合并(减去之间的某些转换)浮点和单词类型)到谷物 - 或已经提到的数据二进制-ieee754。另一种选择是通过show将它们序列化为字符串。这具有避免任何字节序问题的优点。

#2


8  

You can reuse Data.Binary for the purpose with the data-binary-ieee754 package, which allows serialising Floats and Doubles as their IEEE representation. For example:

您可以使用data-binary-ieee754软件包重用Data.Binary,它允许将Floats和Doubles序列化为IEEE表示。例如:

import Data.List
import Data.Binary.Put
import Data.Binary.IEEE754
import Control.Monad

putRawDoubles :: [Double] -> Put
putRawDoubles xs = do
  putWord64le $ genericLength xs
  mapM_ putFloat64le xs

It would be nice if there was an analogue of putWord64host for Doubles in data-binary-ieee754, but since there isn't I just went with little-endian. If you want to be portable across endiannesses without explicitly handling the conversion in your C program, you could try putWord64host . doubleToWord (doubleToWord is also from Data.Binary.IEEE754). Though I think that integer endianness differs from floating-point endianness on some platforms...

如果在data-binary-ieee754中有一个putWord64host for Doubles的模拟会很好,但是因为没有我只是带着little-endian。如果您想在端点上移植而不在C程序中明确处理转换,可以尝试putWord64host。 doubleToWord(doubleToWord也来自Data.Binary.IEEE754)。虽然我认为整数字节序在某些平台上不同于浮点字节序...

Incidentally, I would suggest using a format like this even for your regular serialisation; IEEE floats are universal, and binary's default floating-point format is wasteful (as Daniel Fischer points out).

顺便提一下,我建议使用这样的格式,即使是常规序列化也是如此; IEEE浮点数是通用的,二进制的默认浮点格式是浪费的(正如Daniel Fischer所指出的那样)。

You might also want to consider the cereal serialisation library, which is faster than binary, better-maintained (binary hasn't been updated since 2009) and has IEEE float format support built-in.

您可能还需要考虑谷物序列化库,它比二进制更快,维护得更好(二进制文件自2009年以来未更新)并且内置了IEEE浮点格式支持。

#1


3  

Using Data.Binary to serialize Double or Float values is not great for portability. The Binary instances serialize the values in the form obtained by decodeFloat, i.e. as a mantissa and an exponent. The mantissa is serialized as an Integer. Parsing that is inconvenient. Much better, as has already suggested by ehird, is using a variant that serializes them as the bit-pattern of the IEEE-754 representation, as offered by cereal-ieee754 - as ehird reminded me, that has been merged (minus some conversion between floating point and word types) into cereal - or the already mentioned data-binary-ieee754. Another option is serializing them as strings via show. That has the advantage of avoiding any endianness problems.

使用Data.Binary来序列化Double或Float值对于可移植性来说并不是很好。二进制实例将由decodeFloat获得的形式的值序列化,即作为尾数和指数。尾数被序列化为整数。解析不方便。更好的是,正如ehird已经建议的那样,正在使用一种变体,将它们序列化为IEEE-754表示的位模式,由grain-ieee754提供 - 正如ehird提醒我的那样,已合并(减去之间的某些转换)浮点和单词类型)到谷物 - 或已经提到的数据二进制-ieee754。另一种选择是通过show将它们序列化为字符串。这具有避免任何字节序问题的优点。

#2


8  

You can reuse Data.Binary for the purpose with the data-binary-ieee754 package, which allows serialising Floats and Doubles as their IEEE representation. For example:

您可以使用data-binary-ieee754软件包重用Data.Binary,它允许将Floats和Doubles序列化为IEEE表示。例如:

import Data.List
import Data.Binary.Put
import Data.Binary.IEEE754
import Control.Monad

putRawDoubles :: [Double] -> Put
putRawDoubles xs = do
  putWord64le $ genericLength xs
  mapM_ putFloat64le xs

It would be nice if there was an analogue of putWord64host for Doubles in data-binary-ieee754, but since there isn't I just went with little-endian. If you want to be portable across endiannesses without explicitly handling the conversion in your C program, you could try putWord64host . doubleToWord (doubleToWord is also from Data.Binary.IEEE754). Though I think that integer endianness differs from floating-point endianness on some platforms...

如果在data-binary-ieee754中有一个putWord64host for Doubles的模拟会很好,但是因为没有我只是带着little-endian。如果您想在端点上移植而不在C程序中明确处理转换,可以尝试putWord64host。 doubleToWord(doubleToWord也来自Data.Binary.IEEE754)。虽然我认为整数字节序在某些平台上不同于浮点字节序...

Incidentally, I would suggest using a format like this even for your regular serialisation; IEEE floats are universal, and binary's default floating-point format is wasteful (as Daniel Fischer points out).

顺便提一下,我建议使用这样的格式,即使是常规序列化也是如此; IEEE浮点数是通用的,二进制的默认浮点格式是浪费的(正如Daniel Fischer所指出的那样)。

You might also want to consider the cereal serialisation library, which is faster than binary, better-maintained (binary hasn't been updated since 2009) and has IEEE float format support built-in.

您可能还需要考虑谷物序列化库,它比二进制更快,维护得更好(二进制文件自2009年以来未更新)并且内置了IEEE浮点格式支持。