我怎么能弄清楚我在看什么代码页?

时间:2023-01-27 15:01:08

I have a device with some documentation on how to send it text. It uses 0x00-0x7F to send 'special' characters like accented characters, euro signs, ...

我有一个设备,其中包含一些如何发送文本的文档。它使用0x00-0x7F发送'特殊'字符,如重音字符,欧元符号,......

I am guessing they copied an existing code page and made some changes, but I have no idea how to figure out what code page is closest to the one in my documentation.

我猜他们复制了现有的代码页并进行了一些更改,但我不知道如何找出最接近我文档中的代码页的代码页。

In theory, this should be easy to do. For example, they map Á to 0x41, so if I could find some way to go through all code pages and find the ones that have this character on that position, it would be a piece of cake.

从理论上讲,这应该很容易做到。例如,他们将Á映射到0x41,所以如果我能找到一些方法来浏览所有代码页并找到那个在该位置上具有此字符的代码页,那么这将是一块蛋糕。

However, all I can find on the internet are links to code page dumps just like the one I'm looking at, or software that uses heuristics to read text and guess the most likely code page. Surely someone out there has made it possible to look up what code page one is looking at ?

但是,我在互联网上找到的所有内容都是代码页转储的链接,就像我正在查看的那样,或者是使用启发式方法来阅读文本并猜测最可能的代码页的软件。当然有人在那里可以查找一个正在查看的代码页吗?

5 个解决方案

#1


If it uses 0x00 to 0x7F for the "special" characters, how does it encode the regular ASCII characters?

如果它对“特殊”字符使用0x00到0x7F,它如何编码常规ASCII字符?

In most of the charsets that support the character Á, its codepoint is 193 (0xC1). If you subtract 128 from that, you get 65 (0x41). Maybe your "codepage" is just the upper half of one of the standard charsets like ISO-8859-1 or windows-1252, with the high-order bit set to zero instead of one (that is, subtracting 128 from each one).

在大多数支持字符Á的字符集中,其代码点为193(0xC1)。如果从中减去128,则得到65(0x41)。也许你的“代码页”只是ISO-8859-1或windows-1252等标准字符集之一的上半部分,高阶位设置为零而不是1(即每个字节减去128)。

If that's the case, I would expect to find a flag you can set to tell it whether the next bunch of codepoints should be converted using the "upper" or "lower" encoding. I don't know of any system that uses that scheme, but it's the most sensible explanation I can come with for the situation you describe.

如果是这种情况,我希望找到一个标志,你可以设置告诉它是否应该使用“上”或“下”编码转换下一堆代码点。我不知道任何使用该方案的系统,但对于您描述的情况,这是我可以提出的最明智的解释。

#2


There is no way to auto-detect the codepage without additional information. Below the display layer it’s just bytes and all bytes are created equal. There’s no way to say “I’m a 0x41 from this and that codepage”, there’s only “I’m 0x41. Display me!”

没有其他信息,无法自动检测代码页。在显示层下面,它只是字节,所有字节都是相同的。没有办法说“我是这个和那个代码页的0x41”,只有“我是0x41。显示我!“

#3


What endian is the system? Perhaps you're flipping bit orders?

系统是什么结尾?也许你正在翻点订单?

#4


In most codepages, 0x41 is just the normal "A", I don't think any standard codepages have "Á" in that position. It could have a control character somewhere before the A that added the accent, or uses a non-standard codepage.

在大多数代码页中,0x41只是正常的“A”,我不认为任何标准的代码页在该位置都有“Á”。它可以在添加重音的A之前的某处具有控制字符,或者使用非标准代码页。

I don't see any use in knowing the "closest codepage", you just need to use the docs you got with the device.

我知道“最近的代码页”没有任何用处,你只需要使用你在设备上获得的文档。

Your last sentence is puzzling, what do you mean by "possible to look up what code page one is looking at"?

你的最后一句话令人费解,你的意思是“可以查找一个人正在查看的代码页”?

If you include your whole codepage, people here on SO could be more helpful and give you more insight about this issue, having one data point 0x41=Á doesn't help much.

如果您包含整个代码页,那么SO上的人员可能会更有帮助,并让您更深入地了解此问题,有一个数据点0x41 =Á没有多大帮助。

#5


Somewhat random idea, but if you can get replicate a significant amount of the text off the device, you could try running it through something like the detect function in http://chardet.feedparser.org/.

有些随意的想法,但如果您可以从设备中复制大量文本,您可以尝试通过http://chardet.feedparser.org/中的检测功能运行它。

#1


If it uses 0x00 to 0x7F for the "special" characters, how does it encode the regular ASCII characters?

如果它对“特殊”字符使用0x00到0x7F,它如何编码常规ASCII字符?

In most of the charsets that support the character Á, its codepoint is 193 (0xC1). If you subtract 128 from that, you get 65 (0x41). Maybe your "codepage" is just the upper half of one of the standard charsets like ISO-8859-1 or windows-1252, with the high-order bit set to zero instead of one (that is, subtracting 128 from each one).

在大多数支持字符Á的字符集中,其代码点为193(0xC1)。如果从中减去128,则得到65(0x41)。也许你的“代码页”只是ISO-8859-1或windows-1252等标准字符集之一的上半部分,高阶位设置为零而不是1(即每个字节减去128)。

If that's the case, I would expect to find a flag you can set to tell it whether the next bunch of codepoints should be converted using the "upper" or "lower" encoding. I don't know of any system that uses that scheme, but it's the most sensible explanation I can come with for the situation you describe.

如果是这种情况,我希望找到一个标志,你可以设置告诉它是否应该使用“上”或“下”编码转换下一堆代码点。我不知道任何使用该方案的系统,但对于您描述的情况,这是我可以提出的最明智的解释。

#2


There is no way to auto-detect the codepage without additional information. Below the display layer it’s just bytes and all bytes are created equal. There’s no way to say “I’m a 0x41 from this and that codepage”, there’s only “I’m 0x41. Display me!”

没有其他信息,无法自动检测代码页。在显示层下面,它只是字节,所有字节都是相同的。没有办法说“我是这个和那个代码页的0x41”,只有“我是0x41。显示我!“

#3


What endian is the system? Perhaps you're flipping bit orders?

系统是什么结尾?也许你正在翻点订单?

#4


In most codepages, 0x41 is just the normal "A", I don't think any standard codepages have "Á" in that position. It could have a control character somewhere before the A that added the accent, or uses a non-standard codepage.

在大多数代码页中,0x41只是正常的“A”,我不认为任何标准的代码页在该位置都有“Á”。它可以在添加重音的A之前的某处具有控制字符,或者使用非标准代码页。

I don't see any use in knowing the "closest codepage", you just need to use the docs you got with the device.

我知道“最近的代码页”没有任何用处,你只需要使用你在设备上获得的文档。

Your last sentence is puzzling, what do you mean by "possible to look up what code page one is looking at"?

你的最后一句话令人费解,你的意思是“可以查找一个人正在查看的代码页”?

If you include your whole codepage, people here on SO could be more helpful and give you more insight about this issue, having one data point 0x41=Á doesn't help much.

如果您包含整个代码页,那么SO上的人员可能会更有帮助,并让您更深入地了解此问题,有一个数据点0x41 =Á没有多大帮助。

#5


Somewhat random idea, but if you can get replicate a significant amount of the text off the device, you could try running it through something like the detect function in http://chardet.feedparser.org/.

有些随意的想法,但如果您可以从设备中复制大量文本,您可以尝试通过http://chardet.feedparser.org/中的检测功能运行它。