如何以编程方式禁用硬件预取?

时间:2022-08-25 23:36:09

I would like to programmatically disable hardware prefetching.

我希望以编程方式禁用硬件预取。

From Optimizing Application Performance on Intel® Core™ Microarchitecture Using Hardware-Implemented Prefetchers and How to Choose between Hardware and Software Prefetch on 32-Bit Intel® Architecture, I need to update the MSR to disable hardware prefetching.

从优化应用程序的性能在英特尔®核心™微体系结构使用Hardware-Implemented预取器和硬件和软件之间如何选择预取在32位Intel®体系结构中,我需要更新MSR禁用硬件预取。

Here is a relevant snippet:

以下是相关的片段:

"DPL Prefetch and L2 Streaming Prefetch settings can also be changed programmatically by writing a device driver utility for changing the bits in the IA32_MISC_ENABLE register – MSR 0x1A0. Such a utility offers the ability to enable or disable prefetch mechanisms without requiring any server downtime.

“DPL预取和L2流预取设置也可以通过编写设备驱动程序实用程序来以编程方式更改IA32_MISC_ENABLE寄存器——MSR 0x1A0中的位。这样的实用程序提供了启用或禁用预取机制的能力,而不需要任何服务器停机时间。

The table below shows the bits in the IA32_MISC_ENABLE MSR that have to be changed in order to control the DPL and L2 Streaming Prefetch:

下表显示了IA32_MISC_ENABLE MSR中必须更改的位,以便控制DPL和L2流预取:

Prefetcher Type MSR (0x1A0) Bit Value 
DPL (Hardware Prefetch) Bit 9 0 = Enable 1 = Disable 
L2 Streamer (Adjacent Cache Line Prefetch) Bit 19 0 = Enable 1 = Disable"

I tried using http://etallen.com/msr.html but this did not work. I also tried using wrmsr in asm/msr.h directly but that segfaults. I tried doing this in a kernel module ... and killed the machine.

我尝试过使用http://etallen.com/msr.html,但是没有成功。我也尝试过在asm/msr中使用wrmsr。h直接,但那个分段错误。我在内核模块中尝试过这样做……杀了这台机器。

BTW - I am using kernel 2.6.18-92.el5 and it has MSR linked in the kernel:

顺便说一句,我正在使用内核2.6.18-92。el5在内核中有MSR链接:

$ grep -i msr /boot/config-$(uname -r)
CONFIG_X86_MSR=y
...

4 个解决方案

#1


11  

From the Intel reference:
This instruction must be executed at privilege level 0 or in real-address mode; otherwise, a general protection exception #GP(0) will be generated. Specifying a reserved or unimplemented MSR address in ECX will also cause a general protection exception.

...
The CPUID instruction should be used to determine whether MSRs are supported (EDX[5]=1) before using this instruction.

So, your fault might be related to a cpu that doesn't support MSRs or using the wrong MSR address.

There are lots of examples of using the MSRs in the kernel source:

In the kernel source, for a single cpu, it demonstrates disabling prefetch for the Xeon in arch/i386/kernel/cpu/intel.c, in the function:

static void __cpuinit Intel_errata_workarounds(struct cpuinfo_x86 *c)

The rdmsr function arguments are the msr number, a pointer to the low 32 bit word, and a pointer to the high 32 bit word.
The wrmsr function arguments are the msr number, the low 32 bit word value, and the high 32 bit word value.

multi-core or smp systems have to pass the cpu struct in as the first argument:
void rdmsr_on_cpu(unsigned int cpu, u32 msr_no, u32 *l, u32 *h);
void wrmsr_on_cpu(unsigned int cpu, u32 msr_no, u32 l, u32 h);

来自Intel参考:该指令必须在权限级别0或实地址模式下执行;否则,将生成一个通用保护异常#GP(0)。在ECX中指定保留的或未实现的MSR地址也将导致一般的保护异常。在使用此指令之前,应该使用CPUID指令来确定是否支持MSRs (EDX[5]=1)。因此,您的错误可能与不支持MSRs或使用错误的MSR地址的cpu有关。在内核源代码中有很多使用MSRs的例子:在内核源代码中,对于单个cpu,它演示了在arch/i386/kernel/cpu/intel中禁用Xeon的预取。c,在函数中:静态void __cpuinit Intel_errata_workarounds(struct cpuinfo_x86 *c) rdmsr函数参数是msr数,一个指向低32位字的指针,一个指向高32位字的指针。wrmsr函数的参数是msr号、低32位字值和高32位字值。多核或smp系统必须将cpu结构作为第一个参数传入:void rdmsr_on_cpu(unsigned int cpu, u32 msr_no, u32 *l, u32 *h);void wrmsr_on_cpu(unsigned int cpu, u32 msr_no, u32 l, u32 h);

#2


23  

You can enable or disable the hardware prefetchers using msr-tools http://www.kernel.org/pub/linux/utils/cpu/msr-tools/.

您可以使用msr-tools (http://www.kernel.org/pub/linux/utils/cpu/msr-tools/)启用或禁用硬件预取程序。

The following enables the hardware prefetcher (by unsetting bit 9):

下面是硬件预取器(通过解压位9):

[root@... msr-tools-1.2]# ./wrmsr -p 0 0x1a0 0x60628e2089 
[root@... msr-tools-1.2]# ./rdmsr 0x1a0 
60628e2089

The following disables the hardware prefetcher (by enabling bit 9):

以下命令禁用硬件预取器(通过启用位9):

[root@... msr-tools-1.2]# ./wrmsr -p 0 0x1a0 0x60628e2289 
[root@... msr-tools-1.2]# ./rdmsr 0x1a0 
60628e2289

Programatically, you can do this as root by opening /dev/cpu/<cpunumber>/msr and using pwrite to write to the msr "file" at the 0x1a0 offset.

在编程上,您可以通过打开/dev/cpu/ /msr并使用pwrite在0x1a0偏移量处向msr“文件”写入数据。

#3


2  

I am adding an answer here, because the previous ones may not be applicable to all Intel processors.

我在这里添加一个答案,因为前面的可能不适用于所有的Intel处理器。

For my Intel Xeon 5650 (06_2CH family) processor the manual chapter 35 specifies that bits 10 to 8 of the register IA32_MISC_ENABLE at adress 0x1A0 are reserved. I guess that this means I can't toggle prefetcher on and off trhough MSR.

对于我的Intel Xeon 5650 (06_2CH系列)处理器,手册第35章指定在adress 0x1A0处保留寄存器IA32_MISC_ENABLE的10到8位。我猜这意味着我不能在trhough MSR上和下切换预取器。

According to an answer from an Intel employee here: "Intel has not disclosed how to disable the prefetchers on processors from Nehalem onward.You'll need to disable the prefetchers using options in the BIOS."

根据一位英特尔员工的回答:“英特尔还没有披露如何从Nehalem开始禁用处理器上的预取器。您将需要使用BIOS中的选项禁用预取器。

#4


2  

In 2014 Intel published info about h/w prefetcher disabling with 0x1a4 msr (1a4 msr) for Nehalem, Westmere, Sandy Bridge, Ivy Bridge, Haswell, Broadwell (and probably newer cores). Link was found by bholanath here:

2014年,英特尔发布了Nehalem、Westmere、Sandy Bridge、Ivy Bridge、Haswell、Broadwell(可能还有更新的内核)使用0x1a4 msr禁用h/w prefetcher的信息。bholanath在这里发现了Link:

https://software.intel.com/en-us/articles/disclosure-of-hw-prefetcher-control-on-some-intel-processors Disclosure of H/W prefetcher control on some Intel processors - Vish Viswanathan (Intel), September 24, 2014

2014年9月24日,Vish Viswanathan (Intel)

This article discloses the MSR setting that can be used to control the various h/w prefetchers that are available on Intel processors based on the following microarchitectures: Nehalem, Westmere, Sandy Bridge, Ivy Bridge, Haswell, and Broadwell.

本文公开了可用于控制各种h/w预取器的MSR设置,这些预取器在基于以下微架构的英特尔处理器上可用:Nehalem、Westmere、Sandy Bridge、Ivy Bridge、Haswell和Broadwell。

The above mentioned processors support 4 types of h/w prefetchers for prefetching data. There are 2 prefetchers associated with L1-data cache (also known as DCU DCU prefetcher, DCU IP prefetcher) and 2 prefetchers associated with L2 cache (L2 hardware prefetcher, L2 adjacent cache line prefetcher).

上述处理器支持4种类型的h/w预取器用于预取数据。有2个预取器与L1-data cache(也称为DCU DCU prefetcher, DCU IP prefetcher)关联,2个预取器与L2 cache (L2硬件prefetcher, L2相邻缓存线prefetcher)关联。

There is a Model Specific Register (MSR) on every core with address of 0x1A4 that can be used to control these 4 prefetchers. Bits 0-3 in this register can be used to either enable or disable these prefetchers. Other bits of this MSR are reserved.

每个内核都有一个特定于模型的寄存器(MSR),地址为0x1A4,可以用来控制这4个预取器。这个寄存器中的0-3位可以用来启用或禁用这些预取器。此MSR的其他部分被保留。

They are local to every CPU core and can be changed by root with help of msr linux kernel driver. They are used by Intel to measure memory latency in NUMA with Intel MLC tool:

它们对每个CPU核心都是本地的,可以在msr linux内核驱动程序的帮助下由root用户修改。他们被Intel用来测量NUMA的内存延迟与Intel MLC工具:

For example, Intel Memory Latency Checker tool (http://www.intel.com/software/mlc) modifies the prefetchers through writes to MSR 0x1a4 to measure accurate latencies and restores them to the original state on exit.

例如,Intel内存延迟检查工具(http://www.intel.com/software/mlc)通过写到MSR 0x1a4来修改预取器,以度量准确的延迟,并在退出时将其恢复到原始状态。

#1


11  

From the Intel reference:
This instruction must be executed at privilege level 0 or in real-address mode; otherwise, a general protection exception #GP(0) will be generated. Specifying a reserved or unimplemented MSR address in ECX will also cause a general protection exception.

...
The CPUID instruction should be used to determine whether MSRs are supported (EDX[5]=1) before using this instruction.

So, your fault might be related to a cpu that doesn't support MSRs or using the wrong MSR address.

There are lots of examples of using the MSRs in the kernel source:

In the kernel source, for a single cpu, it demonstrates disabling prefetch for the Xeon in arch/i386/kernel/cpu/intel.c, in the function:

static void __cpuinit Intel_errata_workarounds(struct cpuinfo_x86 *c)

The rdmsr function arguments are the msr number, a pointer to the low 32 bit word, and a pointer to the high 32 bit word.
The wrmsr function arguments are the msr number, the low 32 bit word value, and the high 32 bit word value.

multi-core or smp systems have to pass the cpu struct in as the first argument:
void rdmsr_on_cpu(unsigned int cpu, u32 msr_no, u32 *l, u32 *h);
void wrmsr_on_cpu(unsigned int cpu, u32 msr_no, u32 l, u32 h);

来自Intel参考:该指令必须在权限级别0或实地址模式下执行;否则,将生成一个通用保护异常#GP(0)。在ECX中指定保留的或未实现的MSR地址也将导致一般的保护异常。在使用此指令之前,应该使用CPUID指令来确定是否支持MSRs (EDX[5]=1)。因此,您的错误可能与不支持MSRs或使用错误的MSR地址的cpu有关。在内核源代码中有很多使用MSRs的例子:在内核源代码中,对于单个cpu,它演示了在arch/i386/kernel/cpu/intel中禁用Xeon的预取。c,在函数中:静态void __cpuinit Intel_errata_workarounds(struct cpuinfo_x86 *c) rdmsr函数参数是msr数,一个指向低32位字的指针,一个指向高32位字的指针。wrmsr函数的参数是msr号、低32位字值和高32位字值。多核或smp系统必须将cpu结构作为第一个参数传入:void rdmsr_on_cpu(unsigned int cpu, u32 msr_no, u32 *l, u32 *h);void wrmsr_on_cpu(unsigned int cpu, u32 msr_no, u32 l, u32 h);

#2


23  

You can enable or disable the hardware prefetchers using msr-tools http://www.kernel.org/pub/linux/utils/cpu/msr-tools/.

您可以使用msr-tools (http://www.kernel.org/pub/linux/utils/cpu/msr-tools/)启用或禁用硬件预取程序。

The following enables the hardware prefetcher (by unsetting bit 9):

下面是硬件预取器(通过解压位9):

[root@... msr-tools-1.2]# ./wrmsr -p 0 0x1a0 0x60628e2089 
[root@... msr-tools-1.2]# ./rdmsr 0x1a0 
60628e2089

The following disables the hardware prefetcher (by enabling bit 9):

以下命令禁用硬件预取器(通过启用位9):

[root@... msr-tools-1.2]# ./wrmsr -p 0 0x1a0 0x60628e2289 
[root@... msr-tools-1.2]# ./rdmsr 0x1a0 
60628e2289

Programatically, you can do this as root by opening /dev/cpu/<cpunumber>/msr and using pwrite to write to the msr "file" at the 0x1a0 offset.

在编程上,您可以通过打开/dev/cpu/ /msr并使用pwrite在0x1a0偏移量处向msr“文件”写入数据。

#3


2  

I am adding an answer here, because the previous ones may not be applicable to all Intel processors.

我在这里添加一个答案,因为前面的可能不适用于所有的Intel处理器。

For my Intel Xeon 5650 (06_2CH family) processor the manual chapter 35 specifies that bits 10 to 8 of the register IA32_MISC_ENABLE at adress 0x1A0 are reserved. I guess that this means I can't toggle prefetcher on and off trhough MSR.

对于我的Intel Xeon 5650 (06_2CH系列)处理器,手册第35章指定在adress 0x1A0处保留寄存器IA32_MISC_ENABLE的10到8位。我猜这意味着我不能在trhough MSR上和下切换预取器。

According to an answer from an Intel employee here: "Intel has not disclosed how to disable the prefetchers on processors from Nehalem onward.You'll need to disable the prefetchers using options in the BIOS."

根据一位英特尔员工的回答:“英特尔还没有披露如何从Nehalem开始禁用处理器上的预取器。您将需要使用BIOS中的选项禁用预取器。

#4


2  

In 2014 Intel published info about h/w prefetcher disabling with 0x1a4 msr (1a4 msr) for Nehalem, Westmere, Sandy Bridge, Ivy Bridge, Haswell, Broadwell (and probably newer cores). Link was found by bholanath here:

2014年,英特尔发布了Nehalem、Westmere、Sandy Bridge、Ivy Bridge、Haswell、Broadwell(可能还有更新的内核)使用0x1a4 msr禁用h/w prefetcher的信息。bholanath在这里发现了Link:

https://software.intel.com/en-us/articles/disclosure-of-hw-prefetcher-control-on-some-intel-processors Disclosure of H/W prefetcher control on some Intel processors - Vish Viswanathan (Intel), September 24, 2014

2014年9月24日,Vish Viswanathan (Intel)

This article discloses the MSR setting that can be used to control the various h/w prefetchers that are available on Intel processors based on the following microarchitectures: Nehalem, Westmere, Sandy Bridge, Ivy Bridge, Haswell, and Broadwell.

本文公开了可用于控制各种h/w预取器的MSR设置,这些预取器在基于以下微架构的英特尔处理器上可用:Nehalem、Westmere、Sandy Bridge、Ivy Bridge、Haswell和Broadwell。

The above mentioned processors support 4 types of h/w prefetchers for prefetching data. There are 2 prefetchers associated with L1-data cache (also known as DCU DCU prefetcher, DCU IP prefetcher) and 2 prefetchers associated with L2 cache (L2 hardware prefetcher, L2 adjacent cache line prefetcher).

上述处理器支持4种类型的h/w预取器用于预取数据。有2个预取器与L1-data cache(也称为DCU DCU prefetcher, DCU IP prefetcher)关联,2个预取器与L2 cache (L2硬件prefetcher, L2相邻缓存线prefetcher)关联。

There is a Model Specific Register (MSR) on every core with address of 0x1A4 that can be used to control these 4 prefetchers. Bits 0-3 in this register can be used to either enable or disable these prefetchers. Other bits of this MSR are reserved.

每个内核都有一个特定于模型的寄存器(MSR),地址为0x1A4,可以用来控制这4个预取器。这个寄存器中的0-3位可以用来启用或禁用这些预取器。此MSR的其他部分被保留。

They are local to every CPU core and can be changed by root with help of msr linux kernel driver. They are used by Intel to measure memory latency in NUMA with Intel MLC tool:

它们对每个CPU核心都是本地的,可以在msr linux内核驱动程序的帮助下由root用户修改。他们被Intel用来测量NUMA的内存延迟与Intel MLC工具:

For example, Intel Memory Latency Checker tool (http://www.intel.com/software/mlc) modifies the prefetchers through writes to MSR 0x1a4 to measure accurate latencies and restores them to the original state on exit.

例如,Intel内存延迟检查工具(http://www.intel.com/software/mlc)通过写到MSR 0x1a4来修改预取器,以度量准确的延迟,并在退出时将其恢复到原始状态。