大写字母和小写字母的区别仅仅是一点?

I have found one example in Data and Communication Networking book written by Behrouza Forouzan regarding upper- and lowercase letters which differ by only one bit in the 7 bit code.

我在Behrouza Forouzan所写的数据和通讯网络书中发现了一个例子，上面和小写字母在7位代码中只有一点不同。

For example, character A is 1000001 (0x41) and character a is 1100001 (0x61).The difference is in bit 6, which is 0 in uppercase letters and 1 in lowercase letters. If we know the code for one case, we can easily find the code for the other by adding or subtracting 32 in decimal, or we can just flip the sixth bit.

例如，字符A是1000001 (0x41)，字符A是1100001 (0x61)。不同之处是位6，大写字母为0，小写字母为1。如果我们知道一种情况的代码，我们可以很容易地找到另一种情况的代码，我们可以在十进制中加上或减去32，或者我们可以把第6位翻转过来。

What does all this mean?

这一切意味着什么?

I have found myself very confused with all these things. Could someone provide examples of how these things really work out?

我发现自己对这些事情很困惑。有没有人能举例说明这些事情是如何进行的?

6 个解决方案

#1

Let's use a case that you'll find more familiar: base 10.

让我们用一个你会发现更熟悉的例子:基数10。

Suppose we have a base 10 computer, where each 10bit stores a value from 0 to 9, and a 10byte is 5 10bits long, so that each byte can store 100,000 values (0 through 99,999).

假设我们有一个基本的10计算机，每10位存储一个值从0到9,10字节是5个10字节长，这样每个字节可以存储100,000个值(0到99999)。

You wish to assign letters to particular positions in a 10byte so that this computer can communicate text data with other computers. One way you could do this would be like so:

您希望将字母指定为10字节的特定位置，这样计算机就可以与其他计算机通信文本数据。你可以这样做:

00101 A    00201 a
00102 B    00202 b
00103 C    00203 c
00104 D    00204 d
00105 E    00205 e
00106 F    00206 f
00107 G    00207 g
00108 H    00208 h
00109 I    00209 i
00110 J    00210 j
00111 K    00211 k
00112 L    00212 l
00113 M    00213 m
00114 N    00214 n
00115 O    00215 o
00116 P    00216 p
00117 Q    00217 q
00118 R    00218 r
00119 S    00219 s
00120 T    00220 t
00121 U    00221 u
00122 V    00222 v
00123 W    00223 w
00124 X    00224 x
00125 Y    00225 y
00126 Z    00226 z

Do you see that each lower case letter differs from the upper case letter by only a single 10bit digit, in the 3rd column from the right? It didn't have to be designed this way. It was simply convenient, because then any time we want to adjust the case of a letter we can simply modify one of the digits (10bits) without caring what the rest of the number is or bothering with twenty-six different transformations when we can do one. We couldn't have chosen the second digit because instead of being 100 apart, they'd be only 10 apart and would overlap.

你是否看到每个小写字母与大写字母不同，只有一个10位的数字，在第三列的右边?它不需要这样设计。这很方便，因为如果我们想调整字母的情况，我们可以简单地修改一个数字(10位)而不用关心数字的其余部分，或者在我们可以做一个的时候，用26个不同的转换来困扰。我们不可能选择第二个数字，因为它们不是100，而是10，而且会重叠。
Now, in base 2 it is exactly the same, but instead of each bit representing 0-9, it can only represent 0-1. Using eight 2-bits gives us only 256 possible combinations, 0-255. The ASCII codes for the upper and lower case letters in binary look like this:

现在，在基底2中它是完全一样的，但不是每一个0-9，它只能代表0-1。使用8个2位元只给我们256个可能的组合，0-255。二进制代码的上小写字母的ASCII码是这样的:
```
01000001 A        01100001 a
01000010 B        01100010 b
01000011 C        01100011 c
01000100 D        01100100 d
01000101 E        01100101 e
01000110 F        01100110 f
01000111 G        01100111 g
01001000 H        01101000 h
01001001 I        01101001 i
01001010 J        01101010 j
01001011 K        01101011 k
01001100 L        01101100 l
01001101 M        01101101 m
01001110 N        01101110 n
01001111 O        01101111 o
01010000 P        01110000 p
01010001 Q        01110001 q
01010010 R        01110010 r
01010011 S        01110011 s
01010100 T        01110100 t
01010101 U        01110101 u
01010110 V        01110110 v
01010111 W        01110111 w
01011000 X        01111000 x
01011001 Y        01111001 y
01011010 Z        01111010 z
```
Just the same as before, they differ by only one 2bit digit, here in the 6th column from the right. We couldn't have used a digit any farther to the right (smaller) because then the lists would have overlapped (2^5 = 32 and accordingly we used all bits 0 through 5, but 2^4 = 16, which could not cover the 26 letters of the alphabet).

和之前一样，它们的区别只有一个2位的数字，在右边的第六列。我们不能用一个数字不动(小)因为列表右边会重叠(2 ^ 5 = 32,因此我们使用所有位0到5,但2 ^ 4 = 16,这可能不包括26个字母的字母表)。
Just to fill things out a little, here's an example of what those binary values mean. Let's take the one for G. To understand what 01000111 means in binary:

这里有个例子说明了二进制值的含义。让我们来看看g，来理解01000111的二进制数是什么意思
```
 Pos:   7  6  5  4  3  2  1  0
 Bit:   0  1  0  0  0  1  1  1
 Val: 128 64 32 16  8  4  2  1
Mult:   0 64  0  0  0  4  2  1
 Add: 64 + 4 + 2 + 1 = 71, which is the ASCII code for G.
```
Doing the same thing for the letter G in the special base 10 system I constructed above:

我在上面构建的特殊的10系统中对字母G做同样的事情:
```
  Pos:     4    3    2    1    0
10Bit:     0    0    1    0    7
  Val: 10000 1000  100   10    1
 Mult:     0    0  100    0    7
  Add: 100 + 7 = 107, which is my special 10ASCII code for G.
```
Look back at the "Val" row for binary. Do you see that starting from the right, each value is double the previous one? Doubling each time we get 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, and so on. This is how a binary digit's position determines its value, just like a decimal digit's position determines its value with powers of 10: 1, 10, 100, 1000, 10000, 100000, and so on.

回头看看二进制的“Val”行。从右边开始，每个值都是前一个的两倍吗?每增加一倍，我们得到1、2、4、8、16、32、64、128、256、512等等。这是一个二进制位的位置决定其值的方式，就像十进制位的位置决定它的值，它的值是10:1,10,100,1000,10000,100000，等等。

I realize this seems silly because all I did was convert 107 to 107... but 107 isn't just a number, it is a shorthand form for:

我知道这看起来很傻，因为我所做的就是把107换成107…但是107不仅仅是一个数字，它是一个简写形式:
```
1 hundreds + 0 tens + 7 ones.
```
Another way we could represent that is

另一种表示方法是。
```
0 x 10^4 + 0 x 10^3 + 1 x 10^2 + 0 x 10^1 + 7 x 10^0.
```
Similarly, 01000111 isn't just a binary number, it is a shorthand form for

类似地，01000111不仅仅是一个二进制数，它是一个简写形式。
```
0 x 2^7 + 1 x 2^6 + 0 x 2^5 + 0 x 2^4 + 0 x 2^3 + 1 x 2^2 + 1 x 2^1 + 1 x 2^0
```
Which is what I already showed you:

我已经给你们展示过了
```
0 + 64 + 0 + 0 + 0 + 4 + 2 + 1
= 64 + 4 + 2 + 1
= 71
```

Also, you may have been wondering what 0x41 and 0x61 meant. The 0x part indicates that the digits to follow are to be understood as hexadecimal, which is base 16. There are only 10 digits in our number system, so we need 6 more digits somehow. Thus, hexadecimal uses the digits 0-9 and treats the letters A-F as the remaining digits, where A is 10 up through F as 15. Hexadecimal is very convenient for computers because 16 is a power of 2, and an 8-bit byte thus takes exactly two hex digits to encode (and each hex digit encodes exactly four binary digits). Taking 0x41, expanding 4 to its binary representation 0100 and expanding 1 to its binary representation 0001 you get 01000001, which you can see is the code for A as shown. To convert it to decimal it is 4 x 16 + 1 x 1 = 65. We multiply the 4 by 16 because each successive hexadecimal digit leftward is 16 times the previous digit, following the same pattern as I showed you above for base 2 and 10.

另外，您可能想知道0x41和0x61是什么意思。0x部分表示后面的数字应该被理解为十六进制，即16的基数。在我们的数字系统中只有10位数字，所以我们需要6位数。因此，十六进制使用数字0-9，将字母A-F当作其余的数字，其中A在F上的值为10。十六进制对计算机来说非常方便，因为16是2的幂，8位字节正好是两个十六进制数字的编码(每个十六进制数字编码正好4个二进制数字)。取0x41，将4扩展到二进制表示0100，并将1扩展到二进制表示0001，得到01000001，你可以看到A的代码如下所示。把它化为小数是4 x 16 + 1 x 1 = 65。我们把4乘以16，因为每一个连续的十六进制数字左边是16乘以前面的数字，按照我之前给你们展示的2和10的相同模式。

I hope this is sufficient for you to understand a little bit more about binary and ASCII codes.

我希望这足以让你们理解更多关于二进制和ASCII码的知识。

Note 1: The reason for 8 bits in a byte instead of 2 as you might think is that back in the early days of computing, it was decided that 8 is a much more useful number of bits, as a 2-bit "byte" would only encode 4 values. To transmit the upper and lower case letters of the alphabet alone would require 3 bytes! There is nothing inherent in binary that forces the choice of 8 bits per byte, except that 8 is also a power of 2 which makes a lot of the math involved in working with binary information simpler and things align on edges better. If they had chosen 6 bits per byte, I am sure that things would have worked out awkwardly, and would not have made good use of the full range of values available.

注释1:一个字节中8位的原因，而不是2，你可能会认为，在早期的计算中，8是一个更有用的比特数，因为一个2位的“字节”只会编码4个值。要传输字母表上和小写字母需要3个字节!二进制中没有任何内在的东西迫使每字节8位的选择，除了8是2的幂，这使得很多数学运算涉及到二进制信息更简单，而且在边缘上的对齐更好。如果他们选择了每字节6比特，我确信事情会变得很笨拙，并且不会很好地利用所有可用的值。

Note 2: My system of five bits in a 10byte is based on the impracticality of using ten 10bits per byte, which yields a really huge number that would waste a lot of storage space. I chose five because ten is evenly divisible by it, which would undoubtedly be useful. (Originally, my answer used ten 10bits per 10byte, but it was too darned big!)

注2:我的5比特系统的10字节是基于不实际的，每字节使用10位10位，这将产生一个非常巨大的数字，将浪费大量的存储空间。我选择了5，因为10可以被它整除，这无疑是有用的。(原来，我的答案是每10字节10比特，但它太大了!)

#2

This relationship between the upper case and lower case letters was deliberate. When the ASCII code was formulated, computer hardware was primitive and software needed to conserve every byte. Flipping a single bit takes very little hardware or code to accomplish.

大写字母和小写字母之间的关系是故意的。在制定ASCII码时，计算机硬件是原始的，软件需要保存每个字节。翻转一个字节需要很少的硬件或代码来完成。

#3

http://asciitable.com/

0x61 is hexadecimal for 97 = a
0x41 is hexadecimal for 65 = A

So subtracting/adding decimal 32 is indeed the way to convert to uppercase/lowercase.

因此，减去/增加小数32实际上是转换为大写/小写的方法。

Z is 90 = 0b1111010    = 0x5A
z is 122 = 0b1011010   = 0x7A

Which is a difference of 0b01000000 in binary or 0x20 or 32 in decimal.

在二进制或0x20或32中，这是一个0。0b01000000的差值。

Thus switching the 6th bit changes case.

这样就转换了第6位更改的情况。

#4

take a look, the 6th bit = 32, so if you flip it you subract or add 32

看一下，第6位= 32，如果你把它翻转你的子程序或加32。

Bit value
1   1
2   2
3   4
4   8
5   16
6   32 (32 = hex 20)

Now if you look here http://asciitable.com/, you can see the ascii table for all the characters and will notice that A = 65 and a = 97

如果你看这里http://asciitable.com/，你可以看到所有字符的ascii表格，并会注意到A = 65和A = 97。

#5

In order to add or subtract 32, you first must know whether the character is greater or less than 'A'.

为了增加或减去32，首先必须知道字符是否大于或小于“A”。

When this book was written, the programming languages most people were using did not have Strings, or .equalsIgnoreCase. This was pre-i18n, and when a business had a server, you would telnet to it (like xterm), and get a command line menu. What he's describing, was typically used to create a nice case-insensitive menu for your users, taking advantage of the numeric layout of the ascii table.

当这本书被编写时，大多数人使用的编程语言没有字符串，或者。equalsignorecase。这是pre-i18n，当业务有服务器时，您将telnet(如xterm)，并获得一个命令行菜单。他所描述的，通常用于为用户创建一个漂亮的不区分大小写的菜单，利用了ascii表的数字布局。

It can be very fast, because there are bit-wise assembler instructions to do the math in either direction, regardless of whether the characters are already upper or lowercase.

它可以是非常快的，因为在任何一个方向上都有按位汇编指令来做计算，不管这些字符是否已经是大写或小写。

c = c | 32 // to uppercase

c = c | 32 //大写。

c = c & (1+2+4+8+16+ 0 +64+128) // to lowercase

c = c &(1+2+4+8+16+ 0 +64+128) //小写。

Say you had a Java-like language, without objects or the standard libs. Your networking author is prompting you to code like this:

假设您有一个类似java的语言，没有对象或标准的libs。你的网络作者提示你这样编码:

    public static void main()
    {
        println("What would you like to do?");
        println("Inventory (inv)");
        println("Reports (rep)");

        char[] ca = readUserInput();        
        for (int i = 0; i < ca.length; i++)
            ca[i] = ca[i] | 32;  // convert to uppercase, by ensuring bit 32 is set

        if (compareInput(ca, "INV") == true)
            doInventory();
    }

Have you tried searching Google, and sometimes capitalized a person's name?

你有没有试过搜索谷歌，有时还会大写一个人的名字?

#6

I think most of these answers are unnecessarily complicated and occasionally condescending.

我认为这些答案大多是不必要的复杂，偶尔也会屈尊俯就。

The decimal to ascii character mapping is arbitrary and doesn't really have anything to do with understanding how base 2 or base 10 works. It's purely a convenience thing. If someone mistakenly coded a lowercase character but meant an uppercase, it's more convenient to just flip one bit instead of having to recode an entire byte. It's less prone to human error to just flip one bit. IF the output is 'a' but we wanted 'A', at least we know we got most of the bit right and we just have to flip 2^5 to add or subtract 32. It's that easy. Why pick specifically bit 5 (it's not 6 as some have said, you start from 0..), well clearly that's the one that makes sense to satisfy two ranges of 26 characters with only one bit flip. If you did this on a lesser valued bit, you'd have to flip more than one.

ascii字符映射的小数点是任意的，与理解基数2或基数10如何工作没有任何关系。这纯粹是为了方便。如果有人错误地编写了一个小写字符，但表示大写，那么只需翻转一个字节就更方便了，而不必重新编码整个字节。它不太容易发生人为的错误。如果输出是' a '但我们希望‘一个’,至少我们知道大多数的正确的,我们只需要翻2 ^ 5添加或减去32。它是那么容易。为什么选择具体的5(它不是6，就像有些人说的那样，你从0开始)，很显然，这是一个有意义的满足26个字符的两个范围，只有一个位翻转。如果你在一个较小的值点上做这个，你就得翻转不止一个。

#1