vmware中如何检查cpu的使用状况-一个考题引发的思考

时间:2023-03-09 00:24:59
vmware中如何检查cpu的使用状况-一个考题引发的思考

来自一个VCP的考题,有点兴趣。可参看:

如何在VMware里使用esxtop?

http://thocm.com/a/caozuoxitongzixun/xunihuazonghezixun/VMwarexunih/2012/0922/9326.html

原文http://www.daemonlord.nl/index.php?action=artikel&cat=10&id=85&artlang=en

Interpreting esxtop Statistics

https://communities.vmware.com/docs/DOC-9279?decorator=print

如图:

vmware中如何检查cpu的使用状况-一个考题引发的思考

问题:为了排查某服务器遭遇的性能问题,使用esxtop观测,根据如上esxtop的输出结果,请问Fileserver01分配了几个vCPU?

vmware中如何检查cpu的使用状况-一个考题引发的思考

1.CPU load average

这里会显示3个值,分别是1分钟,5分钟,15分钟间隔内的物理CPU平均负载.

这3个值每隔5秒钟更新一次。

如果值为1表示cpu使用率100%。0.5表示50%使用率。如果是2那么显然是超载了,要么增加物理cpu,要么减少虚拟机数量。

2.PCPU行指示单个核心的使用率

物理cpu的负载。avg是平均值。一般来说80%比较理想(依据不同场景而定不一定非得80%为基准),90%一般可以看做是接近超载的状态了。

如果超线程开启会显示LCPU。

PCPU USED在超线程开启的情况下会和PCPU UTIL不同。

单个物理CPU或者内核作为两个逻辑CPU使用。内核是CPU中的物理硬件块,而线程是软件生成的,并且和缓存、寄存器以及执行单元一样共享硬件组件。由于这种超线程的存在,CPU USED%和PCPU UTIL%因CPU调度器记录使用状况的不同而不同。如果2个线程同时在忙碌状态,则有效使用率只有一半。 PCPU USED(%) – Physical hardware execution context. Can be a physical CPU core if hyperthreading is unavilable or disabled or a logical CPU (LCPU) or SMT thread if hyperthreading is enabled. This displays PCPU percentage of CPU usage when averaged over all PCPUs.

PCPU UTIL(%) - Physical CPU utilised. (real time) Indicates how much time the PCPU was busy, in an unhalted state, in the last snapshot duration. Might differ from PCPU USED(%) due to power management technologies or hyperthreading.

If hyper threading is enabled these figures can be different, likewise if the frequency of the PCPU is changed due to power management these figures can also be adjusted.

As an example if PCPU USED(%) is 100 and PCPU UTIL(%) is 50 this is because hyper threading is splitting the load across the two PCPUs. If you then look in the vSphere client you may notice that CPU usage is 100%. This is because the vSphere client will double the statistics if hyperthreading is enabled.

In a dual core system, each PCPU is charged by the CPU scheduler half of the elapsed time when both PCPUs are busy.

重要解释:

  • "PCPU USED(%)"

The percentage CPU usage per PCPU, and its average over all PCPUs.

Q: What is the difference between "PCPU UTIL(%)" and "PCPU USED(%)"?

A: While "PCPU UTIL(%)" indicates how much time a PCPU was busy (unhalted) in the last duration, "PCPU USED(%)" shows the amount of "effective work" that has been done by this PCPU. The value of "PCPU USED(%)" can be different from "PCPU UTIL(%)" mainly for the following two reasons:

(1) Hyper-threading

The two PCPUs in a core share a lot of hardware resources, including the execution units and cache. And thus, the "effective work" done by a PCPU when the other PCPU in the core is busy is usually much less than the case when the other PCPU is idle. Based on this observation, our CPU scheduler charges each PCPU half of the elapsed durating when both PCPUs are busy. If only one PCPU is busy during a time period, the PCPU is charged for all that time period. Let's use some examples to illustrate this.

  1. '+' means busy, '-' means idle.
  2. (1) PCPU 0:   +++++----- (UTIL: %50 / USED: %50)
  3. PCPU 1:   -----+++++ (UTIL: %50 / USED: %50)
  4. (2) PCPU 0:   +++++----- (UTIL: %50 / USED: %25)
  5. PCPU 1:   +++++----- (UTIL: %50 / USED: %25)
  6. (3) PCPU 0:   +++++----- (UTIL: %50 / USED: %40, i.e. %30 + 20%/2)
  7. PCPU 1:   ---+++++-- (UTIL: %50 / USED: %40, i.e. %20/2 + %30)

In all the three above scenarios, each PCPU is utilized by 50%. But, depending on whether they are busy at the same time, the PCPU USED(%) is between 25% and 50%. Generally speaking,

  1. /- PCPU0_UTIL%/2, if PCPU0_UTIL% < PCPU1_UTIL%
  2. PCPU0_UTIL% >= PCPU0_USED% >= |
  3. \- (PCPU0_UTIL% - PCPU1_UTIL%) + PCPU1_UTIL%/2, otherwise

Please note that the above inequations may not hold due to frequency scaling, which is discussed next.

(2) Power Management

The frequency of a PCPU may be changed due to power management. Obviously, a PCPU does less "effective work" (in a unit of time) when the frequency is lower. The CPU scheduler adjusts the "PCPU USED(%)" based on the frequency of the PCPU.

  1. PCPU_USED% = PCPU_UTIL% * Effective_Frequency / Nominal_Frequency

Suppose that UTIL% is 80%, and the nominal frequency is 2 GHz. If the effective frequency is 1.5 GHz. USED% would be 80% * 1.5 / 2 = 60%. Please note that since the CPU frequency may change often, you may go to the esxtop power screen, pressing 'p', to see how often the PCPU stays at what states, which can help guess the effective frequency.

Please also note that turbo mode may make the effective frequency higher than the nominal frequency. In that case, USED% would be higher than UTIL%.

If we want to add both reasons into account, just to make it more complicated, we can have something like this.

  1. PCPU0_USED%            /- PCPU0_UTIL%/2, if PCPU0_UTIL% < PCPU1_UTIL%
  2. PCPU0_UTIL% >= * Nomial_Frequency    >= |
  3. / Effective_Frequency    \- (PCPU0_UTIL% - PCPU1_UTIL%) + PCPU1_UTIL%/2, otherwise

Q: Why do I see ~100% for the average "PCPU UTIL(%)", but the average "PCPU USED(%)" is ~50%?

A: It is very likely that hyper-threading is enabled. A PCPU is only charged half the time when both PCPUs are busy. Typically,

  1. 0 <= PCPU0_USED% + PCPU1_USED% <= 100% * Effective_Frequency / Base_Frequency

Suppose that CPU frequency is fixed to base frequecy, (e.g. power management features are not used), the sum of PCPU USED% for two PCPUs on the same core would be less than 100%. So, the average PCPU USED(%) won't be higher than 50%.

Q: Why is average CPU usage in vSphere client ~100%, but, average "PCPU USED(%)" in esxtop is ~50%?

A: Same as above. It is likely due to hyper-threading. The average CPU usage in vSphere client is deliberately doubled when hyper-threading is used; while esxtop does not double the average "PCPU USED(%)", which would otherwise mean the average USED% of all the cores.

正确解答:

Open ESXTOP and then expand the GID for a VM. When you do, you will see 4 “common” worlds for the Machine plus 1 world for each CPU. So in this exibit, FileServer01 has 12 worlds: 4 we know are common, so the other 8 must be for vCPUs. BTW, the 4 common worlds are: vmx, vmast, vmx-vthread, and vmx-mks while the CPUs are each listed as vmx-vcpu-0, -1, -2, etc. Also in the exibit, all the other VM’s only have 1 vCPU.