此二进制数据文件的格式/编码

时间:2022-05-14 06:56:03

I'm attempting to write a program that integrates with Advent Axys, software for financial planners and the like. The product's site is here: http://www.advent.com/solutions/asset-managers-software/axys-platform

我正在尝试编写一个与Advent Axys集成的程序,用于理财规划师的软件等。该产品的网站位于:http://www.advent.com/solutions/asset-managers-software/axys-platform

I need to write new entries into the price files, but much of them are binary. I looked around online and didn't find much, and I emailed their support, but I doubt it will help.

我需要在价格文件中写入新条目,但其中大部分都是二进制文件。我在网上环顾四周并没有找到太多,我通过电子邮件发送了他们的支持,但我怀疑它会有所帮助。

I have a short dummy file and the printout that the program gives to said file. I ran the file through a ruby script that prints the character if it is a word character or symbol and the ASCII val otherwise. Here's the Ruby script:

我有一个简短的虚拟文件和程序提供给所述文件的打印输出。我通过ruby脚本运行该文件,如果它是单词字符或符号,则打印字符,否则输出ASCII val。这是Ruby脚本:

pri = File.read '062109_dummy.pri'
pri.each_byte do |char|
  print char.chr =~ /[\w!@#\$%\^&\*\(\)\-\\\/\+\.]/ ? char.chr : ' ' + char.to_s + ' '
end

And output:


pri1.001 254  250  251  252  29  0  0  2 adusnok 0  0  0  0  0  0  0  0  0 33333s7@ 1  254  250  251  252  29  0  0  2 csusxom 0  0  0  0  0  0  0  0  0 H 225 z 20  174 GA@ 1  254  250  251  252  29  0  0  2 etusvv 0  0  0  0  0  0  0  0  0  0  246 (\ 143  194  213 F@ 1  254  250  251  252  29  0  0  2 fdusoakbx 0  0  0  0  0  0  0  174 G 225 z 20  174 (@ 1  254  250  251  252  29  0  0  2 oousfidde09 0  0  0  0  0  154  153  153  153  153  185 S@ 1  254  250251  252  29  0  0  2 qpusfid_eqix 0  0  0  0  164 p 61  10  215 cL@ 1  254  250  251  252  29  0  0  2 vausvg_sc 0  0  0  0  0  0  0 )\ 143  194  245  248 P@ 1 

Note that if a number has spaces around it, that means it's the value of the byte, and if it doesn't, then the value of the byte was the ASCII representation of that number.

请注意,如果数字周围有空格,则表示它是字节的值,如果不是,则该字节的值是该数字的ASCII表示。

I know that the strings of letters (like "adusnok") are the representations of the stocks and the like. Then there are 0-ed bits because the space for the symbols are fixed-size (which is why there are fewer 0's after a longer symbol). The sequence @ 1 254 250 251 252 29 0 0 2 seems to signify the end of a record, coming right before the symbol for a new one. Alternatively, some of it could signify something that is the same for all of these, but not much seems the same. After that, I know basically nothing. I do have the printout of what the program thinks that maps to. With 3 spaces separating each column, it is:

我知道字母串(如“adusnok”)是股票等的代表。然后有0-ed位,因为符号的空间是固定大小的(这就是为什么在更长的符号之后有更少的0)。序列@ 1 254 250 251 252 29 0 0 2似乎表示记录的结束,正好在新记号的符号之前。或者,其中一些可能意味着所有这些都是相同的东西,但似乎并不相同。在那之后,我基本上什么都不知道。我确实打印了程序认为映射到的内容。每列有3个空格,它是:

adus   nok   23.45   NOKIA CORP ADR   0.393  05/30/2008
csus   xom   34.56   EXXON MOBIL CORPORATION COM   1.68   06/10/2009
etus   vv    45.67   VANGUARD LRG CAP ETF US PRIME MKT 750   1.04   3/31/2009

There's more, but that should give you a pretty good idea. I think it's quite possible that the Descriptions, and possible other things, are stored in other files and just looked up. But I know that the prices are in that file, because these are price files and that's the whole point. So:

还有更多,但这应该给你一个很好的主意。我认为很可能将Descriptions和其他可能的东西存储在其他文件中,然后查找起来。但我知道价格在那个文件中,因为这些是价格文件,这就是重点。所以:

33333s7 => 23.45 H225 z 20 174 GA => 34.56 246 (\ 143 194 213 F => 45.67

33333s7 => 23.45 H225 z 20 174 GA => 34.56 246(\ 143 194 213 F => 45.67

Note that save the 3's and 7's in the first one, all of the numbers there are values of the bytes, not the ASCII representations of the values. Also note that those values could represent a little more than just the price, but they definitely represent the price.

请注意,将3和7保存在第一个中,所有数字都是字节值,而不是值的ASCII表示。另请注意,这些值可能只代表价格,但它们绝对代表价格。

Any ideas? I'm not familiar with common binary encodings, but I wouldn't be surprised if they used a fairly common method.

有任何想法吗?我不熟悉常见的二进制编码,但如果他们使用相当常见的方法我也不会感到惊讶。

1 个解决方案

#1


Reverse engineering a binary format is dangerous if you are going to ship your reverse engineered codec. They may change the file format w/o warning. However, if you are bound and determined to do it:

如果要运送逆向工程编解码器,则逆向工程二进制格式是危险的。他们可能会更改没有警告的文件格式。但是,如果您受到约束并决心这样做:

One thing you could do is to look at the format for IEEE floating point numbers:

您可以做的一件事是查看IEEE浮点数的格式:

http://steve.hollasch.net/cgindex/coding/ieeefloat.html

And then, starting at the first byte in the file, read 4 or 8 bytes of data. Convert both sets (4 bytes and 8 bytes) to float and double values. Check to see if they match the values that you know are in the file. If so, you have probably found the offset of a price. Print it out, plus the offset. If not, increment your seek by one byte and try again.

然后,从文件的第一个字节开始,读取4或8个字节的数据。将两个集合(4个字节和8个字节)转换为float和double值。检查它们是否与您知道的文件中的值相匹配。如果是这样,您可能已经找到了价格的偏差。打印出来,加上偏移量。如果没有,请将搜索增加一个字节,然后重试。

If you can find all the values that way, then you might be able to safely patch the binary files at runtime by performing a similar operation: looking for the prices that you know are there, and then modifying the price values in the right place.

如果您可以通过这种方式找到所有值,那么您可以通过执行类似的操作在运行时安全地修补二进制文件:查找您知道的价格,然后在正确的位置修改价格值。

This isn't foolproof at all, because random sequences of data will sometimes match up. If you notice a definite distance between offsets, or some sigil that is always present, or perhaps even better, if you can find those offset values back in the file, you may have something modestly stable.

这根本不是万无一失的,因为随机数据序列有时会匹配。如果您注意到偏移之间的确定距离,或者总是存在的某些印记,或者甚至更好,如果您可以在文件中找到这些偏移值,您可能会有一些适度稳定的东西。

#1


Reverse engineering a binary format is dangerous if you are going to ship your reverse engineered codec. They may change the file format w/o warning. However, if you are bound and determined to do it:

如果要运送逆向工程编解码器,则逆向工程二进制格式是危险的。他们可能会更改没有警告的文件格式。但是,如果您受到约束并决心这样做:

One thing you could do is to look at the format for IEEE floating point numbers:

您可以做的一件事是查看IEEE浮点数的格式:

http://steve.hollasch.net/cgindex/coding/ieeefloat.html

And then, starting at the first byte in the file, read 4 or 8 bytes of data. Convert both sets (4 bytes and 8 bytes) to float and double values. Check to see if they match the values that you know are in the file. If so, you have probably found the offset of a price. Print it out, plus the offset. If not, increment your seek by one byte and try again.

然后,从文件的第一个字节开始,读取4或8个字节的数据。将两个集合(4个字节和8个字节)转换为float和double值。检查它们是否与您知道的文件中的值相匹配。如果是这样,您可能已经找到了价格的偏差。打印出来,加上偏移量。如果没有,请将搜索增加一个字节,然后重试。

If you can find all the values that way, then you might be able to safely patch the binary files at runtime by performing a similar operation: looking for the prices that you know are there, and then modifying the price values in the right place.

如果您可以通过这种方式找到所有值,那么您可以通过执行类似的操作在运行时安全地修补二进制文件:查找您知道的价格,然后在正确的位置修改价格值。

This isn't foolproof at all, because random sequences of data will sometimes match up. If you notice a definite distance between offsets, or some sigil that is always present, or perhaps even better, if you can find those offset values back in the file, you may have something modestly stable.

这根本不是万无一失的,因为随机数据序列有时会匹配。如果您注意到偏移之间的确定距离,或者总是存在的某些印记,或者甚至更好,如果您可以在文件中找到这些偏移值,您可能会有一些适度稳定的东西。