在64位计算机上安装32位操作系统是否有带宽改进?

时间:2022-09-01 10:05:44

Knuth recently objected to 64-bit systems, saying that for programs which fit in 4 gigs of memory, "they effectively throw away half of the cache" because the pointers are twice as big as on a 32-bit system.

Knuth最近反对64位系统,称对于适合4 GB内存的程序,“它们实际上会丢弃一半的缓存”,因为指针是32位系统的两倍。

My question is: can this problem be avoided by installing a 32-bit operating system on a 64-bit machine? And are there any bandwidth-intensive benchmarks which demonstrate the advantage in this case?

我的问题是:通过在64位计算机上安装32位操作系统可以避免这个问题吗?是否有任何带宽密集型基准测试证明了这种情况下的优势?

4 个解决方案

#1


4  

The answer is: yes it can to a certain extent, although the performance difference is unlikely to be great.

答案是:是的,它可以在一定程度上,虽然性能差异不太可能很大。

Any benchmark to test this will have to do a lot of pointer resolution, which will be difficult to separate out from the noise. Designing a benchmark that will not optimise away is difficult. This article about flawed java benchmarks was posted by someone in response to another question, but many of the principles described in it will apply to this.

任何测试它的基准都必须进行大量的指针分辨,这很难从噪声中分离出来。设计一个不会优化的基准是很困难的。这篇关于有缺陷的java基准测试的文章是由某人回答另一个问题而发布的,但其中描述的许多原则都适用于此。

#2


6  

Bandwidth is not really the correct term here. What Knuth was really talking about was data density, as it relates to cache footprint. Imagine that you have a 16KB L1 data cache: If you're purely storing pointers, you can store 2^14/2^2 = 2^12 = 4096 32-bit pointers, but only 2048 64-bit pointers. If the performance of your application depends on being able to keep track of over 2K different buffers, you may see a real performance benefit from a 32-bit address space. However, most real code is not this way, and real performance benefits from a caching system often come from being able to cache common integer and floating-point data structures, not huge quantities of pointers. If your working set is not pointer-heavy, the downside of 64-bit becomes negligible, and the upside becomes much more obvious if you're performing a lot of 64-bit integer arithmetic.

带宽在这里并不是正确的术语。 Knuth真正谈论的是数据密度,因为它与缓存占用量有关。想象一下,你有一个16KB的L1数据缓存:如果你纯粹存储指针,你可以存储2 ^ 14/2 ^ 2 = 2 ^ 12 = 4096个32位指针,但只能存储2048个64位指针。如果应用程序的性能取决于能够跟踪超过2K的不同缓冲区,您可能会从32位地址空间中看到真正的性能优势。但是,大多数真正的代码不是这种方式,缓存系统的真正性能优势通常来自能够缓存常见的整数和浮点数据结构,而不是大量的指针。如果你的工作集不是指针重,那么64位的下行变得可以忽略不计,如果你执行大量的64位整数运算,那么上行就会变得更加明显。

#3


4  

I don't think Knuth objected to 64-bit systems. He just said that using 64-bit pointers on a system that has less than 4GB ram is idiotic (at least if you have lots of pointers like the ones in a double-linked list). I can't say that I agree with him, here are 3 different ways that can be taken. Let's assume you have a 64-bit capable CPU that can also run in 32-bit mode like some Intel Core Duo.

我不认为Knuth反对64位系统。他只是说在一个小于4GB ram的系统上使用64位指针是愚蠢的(至少如果你有很多像双链表中那样的指针)。我不能说我同意他,这里有三种不同的方式可以采取。假设你有一个64位的CPU,也可以像32位英特尔酷睿双核一样在32位模式下运行。

1 - Everything is 32-bit, the OS, the APPZ, all of them. So you have 32-bit pointers but you can not use the extra registers/instructions that are available on 64-bit mode.

1 - 一切都是32位,操作系统,APPZ,所有这些。因此,您有32位指针但不能使用64位模式下可用的额外寄存器/指令。

2 - Everything is 64-bit, the OS, the APPZ, all of them. So you have 64-bit pointers and you can use the extra registers/instructions that are available on 64-bit mode. But as you have less than 4GB ram, using 64-bit pointers seems like idiotic. But, is it ?

2 - 一切都是64位,操作系统,APPZ,所有这些。所以你有64位指针,你可以使用64位模式下可用的额外寄存器/指令。但是,由于你的内存不足4GB,使用64位指针似乎是愚蠢的。但是,是吗?

3 - OS is 64-bit and OS interestingly makes sure that all the code/data pointers are in the 0x00000000 - 0xFFFFFFFF range (Virtual Memory !!!). The ABI runs in a very strange way that all the code/data pointers kept in memory/files are 32-bit wide but they are loaded into 64-bit registers as zero-extended. If there is a code location to jump, compiler/ABI does the necessary fix-ups and does the actual 64-bit jump. This way, pointers are 32-bit but APPZ can be 64-bit meaning they can make use of the 64-bit registers and instructions. This process is something like thunking, I think ;-P

3 - 操作系统是64位操作系统,操作系统有趣地确保所有代码/数据指针都在0x00000000 - 0xFFFFFFFF范围内(虚拟内存!!!)。 ABI以一种非常奇怪的方式运行,保存在内存/文件中的所有代码/数据指针都是32位宽,但它们作为零扩展加载到64位寄存器中。如果有一个要跳转的代码位置,编译器/ ABI会进行必要的修复并执行实际的64位跳转。这样,指针是32位,但APPZ可以是64位,这意味着它们可以使用64位寄存器和指令。我认为这个过程类似于thunking ;-P

My conclusion is ::

我的结论是::

The 3rd option seemed doable to me but it is not an easy problem. In theory it can work but I do not think it is feasible. And I also think that his quote "When such pointer values appear inside a struct, they not only waste half the memory, they effectively throw away half of the cache." is exaggerated...

第三种选择似乎对我来说可行,但这不是一个容易的问题。理论上它可以工作,但我不认为这是可行的。而且我也认为他的引语“当这样的指针值出现在结构中时,它们不仅浪费了一半的内存,而且还有效地丢弃了一半的缓存。”被夸大了......

#4


2  

i've seen somewhere that the best mix (on x86 CPUs) is to use a 64-bit OS and 32-bit applications.

我已经看到最好的混合(在x86 CPU上)是使用64位操作系统和32位应用程序。

with a 64-bit OS you get:

使用64位操作系统:

  • ability to handle more than 4GB of address space
  • 能够处理超过4GB的地址空间

  • more, bigger registers to help in data-copying operations
  • 更多,更大的寄存器,以帮助数据复制操作

with a 32-bit app you get:

使用32位应用程序:

  • smaller pointers
  • less, smaller registers to save on context switches
  • 更少,更小的寄存器来保存上下文切换

cons:

  • all libraries must be duplicated. tiny by HD space standards.
  • 必须复制所有库。高清空间标准很小。

  • all loaded libraries are duplicated on RAM. not so tiny...
  • 所有加载的库都复制在RAM上。不是那么小......

surprisingly, there seems not to be any overhead when switching modes. I guess that breaking from userspace to kernel costs the same, no matter the bitness of the userspace.

令人惊讶的是,切换模式时似乎没有任何开销。我想无论用户空间的位数如何,从用户空间到内核的成本都是相同的。

of course, there are some applications that benefit from big address space. but for everything else, you can get an extra 5% performance by staying at 32-bit.

当然,有些应用程序可以从大地址空间中受益。但对于其他一切,你可以通过保持32位获得额外5%的性能。

and no, i don't care about this small speedup. but it doesn't "offend" me to run 32-bit FireFox on a 64-bit KUbuntu machine (like i've seen on some forums)

不,我不关心这个小加速。但它并没有“冒犯”我在64位KUbuntu机器上运行32位FireFox(就像我在一些论坛上看到的那样)

#1


4  

The answer is: yes it can to a certain extent, although the performance difference is unlikely to be great.

答案是:是的,它可以在一定程度上,虽然性能差异不太可能很大。

Any benchmark to test this will have to do a lot of pointer resolution, which will be difficult to separate out from the noise. Designing a benchmark that will not optimise away is difficult. This article about flawed java benchmarks was posted by someone in response to another question, but many of the principles described in it will apply to this.

任何测试它的基准都必须进行大量的指针分辨,这很难从噪声中分离出来。设计一个不会优化的基准是很困难的。这篇关于有缺陷的java基准测试的文章是由某人回答另一个问题而发布的,但其中描述的许多原则都适用于此。

#2


6  

Bandwidth is not really the correct term here. What Knuth was really talking about was data density, as it relates to cache footprint. Imagine that you have a 16KB L1 data cache: If you're purely storing pointers, you can store 2^14/2^2 = 2^12 = 4096 32-bit pointers, but only 2048 64-bit pointers. If the performance of your application depends on being able to keep track of over 2K different buffers, you may see a real performance benefit from a 32-bit address space. However, most real code is not this way, and real performance benefits from a caching system often come from being able to cache common integer and floating-point data structures, not huge quantities of pointers. If your working set is not pointer-heavy, the downside of 64-bit becomes negligible, and the upside becomes much more obvious if you're performing a lot of 64-bit integer arithmetic.

带宽在这里并不是正确的术语。 Knuth真正谈论的是数据密度,因为它与缓存占用量有关。想象一下,你有一个16KB的L1数据缓存:如果你纯粹存储指针,你可以存储2 ^ 14/2 ^ 2 = 2 ^ 12 = 4096个32位指针,但只能存储2048个64位指针。如果应用程序的性能取决于能够跟踪超过2K的不同缓冲区,您可能会从32位地址空间中看到真正的性能优势。但是,大多数真正的代码不是这种方式,缓存系统的真正性能优势通常来自能够缓存常见的整数和浮点数据结构,而不是大量的指针。如果你的工作集不是指针重,那么64位的下行变得可以忽略不计,如果你执行大量的64位整数运算,那么上行就会变得更加明显。

#3


4  

I don't think Knuth objected to 64-bit systems. He just said that using 64-bit pointers on a system that has less than 4GB ram is idiotic (at least if you have lots of pointers like the ones in a double-linked list). I can't say that I agree with him, here are 3 different ways that can be taken. Let's assume you have a 64-bit capable CPU that can also run in 32-bit mode like some Intel Core Duo.

我不认为Knuth反对64位系统。他只是说在一个小于4GB ram的系统上使用64位指针是愚蠢的(至少如果你有很多像双链表中那样的指针)。我不能说我同意他,这里有三种不同的方式可以采取。假设你有一个64位的CPU,也可以像32位英特尔酷睿双核一样在32位模式下运行。

1 - Everything is 32-bit, the OS, the APPZ, all of them. So you have 32-bit pointers but you can not use the extra registers/instructions that are available on 64-bit mode.

1 - 一切都是32位,操作系统,APPZ,所有这些。因此,您有32位指针但不能使用64位模式下可用的额外寄存器/指令。

2 - Everything is 64-bit, the OS, the APPZ, all of them. So you have 64-bit pointers and you can use the extra registers/instructions that are available on 64-bit mode. But as you have less than 4GB ram, using 64-bit pointers seems like idiotic. But, is it ?

2 - 一切都是64位,操作系统,APPZ,所有这些。所以你有64位指针,你可以使用64位模式下可用的额外寄存器/指令。但是,由于你的内存不足4GB,使用64位指针似乎是愚蠢的。但是,是吗?

3 - OS is 64-bit and OS interestingly makes sure that all the code/data pointers are in the 0x00000000 - 0xFFFFFFFF range (Virtual Memory !!!). The ABI runs in a very strange way that all the code/data pointers kept in memory/files are 32-bit wide but they are loaded into 64-bit registers as zero-extended. If there is a code location to jump, compiler/ABI does the necessary fix-ups and does the actual 64-bit jump. This way, pointers are 32-bit but APPZ can be 64-bit meaning they can make use of the 64-bit registers and instructions. This process is something like thunking, I think ;-P

3 - 操作系统是64位操作系统,操作系统有趣地确保所有代码/数据指针都在0x00000000 - 0xFFFFFFFF范围内(虚拟内存!!!)。 ABI以一种非常奇怪的方式运行,保存在内存/文件中的所有代码/数据指针都是32位宽,但它们作为零扩展加载到64位寄存器中。如果有一个要跳转的代码位置,编译器/ ABI会进行必要的修复并执行实际的64位跳转。这样,指针是32位,但APPZ可以是64位,这意味着它们可以使用64位寄存器和指令。我认为这个过程类似于thunking ;-P

My conclusion is ::

我的结论是::

The 3rd option seemed doable to me but it is not an easy problem. In theory it can work but I do not think it is feasible. And I also think that his quote "When such pointer values appear inside a struct, they not only waste half the memory, they effectively throw away half of the cache." is exaggerated...

第三种选择似乎对我来说可行,但这不是一个容易的问题。理论上它可以工作,但我不认为这是可行的。而且我也认为他的引语“当这样的指针值出现在结构中时,它们不仅浪费了一半的内存,而且还有效地丢弃了一半的缓存。”被夸大了......

#4


2  

i've seen somewhere that the best mix (on x86 CPUs) is to use a 64-bit OS and 32-bit applications.

我已经看到最好的混合(在x86 CPU上)是使用64位操作系统和32位应用程序。

with a 64-bit OS you get:

使用64位操作系统:

  • ability to handle more than 4GB of address space
  • 能够处理超过4GB的地址空间

  • more, bigger registers to help in data-copying operations
  • 更多,更大的寄存器,以帮助数据复制操作

with a 32-bit app you get:

使用32位应用程序:

  • smaller pointers
  • less, smaller registers to save on context switches
  • 更少,更小的寄存器来保存上下文切换

cons:

  • all libraries must be duplicated. tiny by HD space standards.
  • 必须复制所有库。高清空间标准很小。

  • all loaded libraries are duplicated on RAM. not so tiny...
  • 所有加载的库都复制在RAM上。不是那么小......

surprisingly, there seems not to be any overhead when switching modes. I guess that breaking from userspace to kernel costs the same, no matter the bitness of the userspace.

令人惊讶的是,切换模式时似乎没有任何开销。我想无论用户空间的位数如何,从用户空间到内核的成本都是相同的。

of course, there are some applications that benefit from big address space. but for everything else, you can get an extra 5% performance by staying at 32-bit.

当然,有些应用程序可以从大地址空间中受益。但对于其他一切,你可以通过保持32位获得额外5%的性能。

and no, i don't care about this small speedup. but it doesn't "offend" me to run 32-bit FireFox on a 64-bit KUbuntu machine (like i've seen on some forums)

不,我不关心这个小加速。但它并没有“冒犯”我在64位KUbuntu机器上运行32位FireFox(就像我在一些论坛上看到的那样)