Getting cache info using C/C++ with inline assembly/intrinsics in osx

时间:2022-10-31 15:13:36

I wrote the following program using both gcc __get_cpuid and inline assembly to get the cache info of my laptop but fail to identify them on the table about (Encoding of Cache and TLB Descriptors) I found online.

我使用gcc __get_cpuid和内联汇编编写了以下程序来获取我的笔记本电脑的缓存信息,但无法在桌面上识别它们(我在网上找到的缓存和TLB描述符的编码)。

#include <stdio.h>
#include <stdlib.h> 
#include <assert.h>
#include <string.h>
#include <time.h>
#include <stdint.h>
#include <math.h>
#include <cpuid.h>

static inline void cpuid(uint32_t *eax, uint32_t *ebx,
                         uint32_t *ecx, uint32_t *edx);

int main() {
    uint32_t a, b, c, d;
    uint32_t eax, ebx, ecx, edx;
    eax = 2; /* processor info and feature bits */
    uint32_t command = 2;
    cpuid(&eax, &ebx, &ecx, &edx);
    __get_cpuid(command, &a, &b, &c, &d);

    printf("eax: %08x\n", eax);
    printf("ebx: %08x\n", ebx);
    printf("ecx: %08x\n", ecx);
    printf("edx: %08x\n", edx);

    printf("a: %08x\n", a);
    printf("b: %08x\n", b);
    printf("c: %08x\n", c);
    printf("d: %08x\n", d);
}

static inline void cpuid(uint32_t *eax, uint32_t *ebx,
                         uint32_t *ecx, uint32_t *edx)
{
        /* ecx is often an input as well as an output. */
        asm ("cpuid"
            : "=a" (*eax),
              "=b" (*ebx),
              "=c" (*ecx),
              "=d" (*edx)
            : "0" (*eax));
}

my output:

我的输出:

eax: 76036301
ebx: 00f0b5ff
ecx: 00000000
edx: 00c10000
a: 76036301
b: 00f0b5ff
c: 00000000
d: 00c10000

I found this table from here Getting cache info using C/C++ with inline assembly/intrinsics in osx

我从这里找到了这张桌子

I use sysctl hw.cachesize and find that

我使用sysctl hw.cachesize并找到它

L1 cache: 32KB
L2 cache: 256KB
L3 cache: 6MB

My Environment:

我的环境:

system: os x 10.10.1
compiler: clang-602.0.53
CPU: I7-4850 HQ 2.3HZ

What's wrong with my program? My program should work since both methods give the same result... I am confused about this. Thank you!

我的计划有什么问题?我的程序应该工作,因为两种方法都给出相同的结果......我对此感到困惑。谢谢!

EDIT: I try what Mats' suggested and get the following as my output:

编辑:我尝试Mats的建议,并得到以下作为我的输出:

gcc intrinsic
a: 76036301
b: 00f0b5ff
c: 00000000
d: 00c10000
eax: 2
eax: 76036301
ebx: 00f0b5ff
ecx: 00000000
edx: 00c10000
eax: 4, ecx: 0
eax: 1c004121
ebx: 01c0003f
ecx: 0000003f
edx: 00000000
eax: 4, ecx: 1
eax: 1c004122
ebx: 01c0003f
ecx: 0000003f
edx: 00000000
eax: 4, ecx: 2
eax: 1c004143
ebx: 01c0003f
ecx: 000001ff
edx: 00000000
eax: 4, ecx: 3
eax: 1c03c163
ebx: 02c0003f
ecx: 00001fff
edx: 00000006
eax: 4, ecx: 4
eax: 1c03c183
ebx: 03c0f03f
ecx: 00001fff
edx: 00000004
eax: 4, ecx: 5
eax: 00000000
ebx: 00000000
ecx: 00000000
edx: 00000000

I look up the table at here
static cpuid_cache_descriptor_t intel_cpuid_leaf2_descriptor_table[] = {

我在这里查看表静态cpuid_cache_descriptor_t intel_cpuid_leaf2_descriptor_table [] = {

//  -------------------------------------------------------
//  value   type    level       ways    size    entries
//  -------------------------------------------------------
    { 0x00, _NULL_, NA,     NA, NA, NA  },
    { 0x01, TLB,    INST,       4,  SMALL,  32  },  
    { 0x02, TLB,    INST,       FULLY,  LARGE,  2   },  
    { 0x03, TLB,    DATA,       4,  SMALL,  64  },  
    { 0x04, TLB,    DATA,       4,  LARGE,  8   },  
    { 0x05, TLB,    DATA1,      4,  LARGE,  32  },  
    { 0x06, CACHE,  L1_INST,    4,  8*K,    32  },
    { 0x08, CACHE,  L1_INST,    4,  16*K,   32  },
    { 0x09, CACHE,  L1_INST,    4,  32*K,   64  },
    { 0x0A, CACHE,  L1_DATA,    2,  8*K,    32  },
    { 0x0B, TLB,    INST,       4,  LARGE,  4   },  
    { 0x0C, CACHE,  L1_DATA,    4,  16*K,   32  },
    { 0x0D, CACHE,  L1_DATA,    4,  16*K,   64  },
    { 0x0E, CACHE,  L1_DATA,    6,  24*K,   64  },
    { 0x21, CACHE,  L2,     8,  256*K,  64  },
    { 0x22, CACHE,  L3_2LINESECTOR, 4,  512*K,  64  },
    { 0x23, CACHE,  L3_2LINESECTOR, 8,  1*M,    64  },
    { 0x25, CACHE,  L3_2LINESECTOR, 8,  2*M,    64  },
    { 0x29, CACHE,  L3_2LINESECTOR, 8,  4*M,    64  },
    { 0x2C, CACHE,  L1_DATA,    8,  32*K,   64  },
    { 0x30, CACHE,  L1_INST,    8,  32*K,   64  },
    { 0x40, CACHE,  L2,     NA, 0,  NA  },
    { 0x41, CACHE,  L2,     4,  128*K,  32  },
    { 0x42, CACHE,  L2,     4,  256*K,  32  },
    { 0x43, CACHE,  L2,     4,  512*K,  32  },
    { 0x44, CACHE,  L2,     4,  1*M,    32  },
    { 0x45, CACHE,  L2,     4,  2*M,    32  },
    { 0x46, CACHE,  L3,     4,  4*M,    64  },
    { 0x47, CACHE,  L3,     8,  8*M,    64  },
    { 0x48, CACHE,  L2,     12,     3*M,    64  },
    { 0x49, CACHE,  L2,     16, 4*M,    64  },
    { 0x4A, CACHE,  L3,     12,     6*M,    64  },
    { 0x4B, CACHE,  L3,     16, 8*M,    64  },
    { 0x4C, CACHE,  L3,     12,     12*M,   64  },
    { 0x4D, CACHE,  L3,     16, 16*M,   64  },
    { 0x4E, CACHE,  L2,     24, 6*M,    64  },
    { 0x4F, TLB,    INST,       NA, SMALL,  32  },  
    { 0x50, TLB,    INST,       NA, BOTH,   64  },  
    { 0x51, TLB,    INST,       NA, BOTH,   128 },  
    { 0x52, TLB,    INST,       NA, BOTH,   256 },  
    { 0x55, TLB,    INST,       FULLY,  BOTH,   7   },  
    { 0x56, TLB,    DATA0,      4,  LARGE,  16  },  
    { 0x57, TLB,    DATA0,      4,  SMALL,  16  },  
    { 0x59, TLB,    DATA0,      FULLY,  SMALL,  16  },  
    { 0x5A, TLB,    DATA0,      4,  LARGE,  32  },  
    { 0x5B, TLB,    DATA,       NA, BOTH,   64  },  
    { 0x5C, TLB,    DATA,       NA, BOTH,   128 },  
    { 0x5D, TLB,    DATA,       NA, BOTH,   256 },  
    { 0x60, CACHE,  L1,     16*K,   8,  64  },
    { 0x61, CACHE,  L1,     4,  8*K,    64  },
    { 0x62, CACHE,  L1,     4,  16*K,   64  },
    { 0x63, CACHE,  L1,     4,  32*K,   64  },
    { 0x70, CACHE,  TRACE,      8,  12*K,   NA  },
    { 0x71, CACHE,  TRACE,      8,  16*K,   NA  },
    { 0x72, CACHE,  TRACE,      8,  32*K,   NA  },
    { 0x78, CACHE,  L2,     4,  1*M,    64  },
    { 0x79, CACHE,  L2_2LINESECTOR, 8,  128*K,  64  },
    { 0x7A, CACHE,  L2_2LINESECTOR, 8,  256*K,  64  },
    { 0x7B, CACHE,  L2_2LINESECTOR, 8,  512*K,  64  },
    { 0x7C, CACHE,  L2_2LINESECTOR, 8,  1*M,    64  },
    { 0x7D, CACHE,  L2,     8,  2*M,    64  },
    { 0x7F, CACHE,  L2,     2,  512*K,  64  },
    { 0x80, CACHE,  L2,     8,  512*K,  64  },
    { 0x82, CACHE,  L2,     8,  256*K,  32  },
    { 0x83, CACHE,  L2,     8,  512*K,  32  },
    { 0x84, CACHE,  L2,     8,  1*M,    32  },
    { 0x85, CACHE,  L2,     8,  2*M,    32  },
    { 0x86, CACHE,  L2,     4,  512*K,  64  },
    { 0x87, CACHE,  L2,     8,  1*M,    64  },
    { 0xB0, TLB,    INST,       4,  SMALL,  128 },  
    { 0xB1, TLB,    INST,       4,  LARGE,  8   },  
    { 0xB2, TLB,    INST,       4,  SMALL,  64  },  
    { 0xB3, TLB,    DATA,       4,  SMALL,  128 },  
    { 0xB4, TLB,    DATA1,      4,  SMALL,  256 },  
    { 0xBA, TLB,    DATA1,      4,  BOTH,   64  },  
    { 0xCA, STLB,   DATA1,      4,  BOTH,   512 },  
    { 0xD0, CACHE,  L3,     4,  512*K,  64  },  
    { 0xD1, CACHE,  L3,     4,  1*M,    64  },  
    { 0xD2, CACHE,  L3,     4,  2*M,    64  },  
    { 0xD3, CACHE,  L3,     4,  4*M,    64  },  
    { 0xD4, CACHE,  L3,     4,  8*M,    64  },  
    { 0xD6, CACHE,  L3,     8,  1*M,    64  },  
    { 0xD7, CACHE,  L3,     8,  2*M,    64  },  
    { 0xD8, CACHE,  L3,     8,  4*M,    64  },  
    { 0xD9, CACHE,  L3,     8,  8*M,    64  },  
    { 0xDA, CACHE,  L3,     8,  12*M,   64  },  
    { 0xDC, CACHE,  L3,     12,     1536*K, 64  },  
    { 0xDD, CACHE,  L3,     12,     3*M,    64  },  
    { 0xDE, CACHE,  L3,     12,     6*M,    64  },  
    { 0xDF, CACHE,  L3,     12, 12*M,   64  },  
    { 0xE0, CACHE,  L3,     12, 18*M,   64  },  
    { 0xE2, CACHE,  L3,     16, 2*M,    64  },  
    { 0xE3, CACHE,  L3,     16, 4*M,    64  },  
    { 0xE4, CACHE,  L3,     16, 8*M,    64  },  
    { 0xE5, CACHE,  L3,     16, 16*M,   64  },  
    { 0xE6, CACHE,  L3,     16, 24*M,   64  },  
    { 0xF0, PREFETCH, NA,       NA, 64, NA  },  
    { 0xF1, PREFETCH, NA,       NA, 128,    NA  }   
};

The problem right now is that I still cannot get the correct size of my L3 cache(when ecx=1, I get 22 i.e. 512K, but the correct value is 6MB). Also, there seems to be some conflicts in terms of the size of my L2 cache(43(when ecx=2) and 21(when ecx=0) )

现在的问题是我仍然无法获得正确的L3缓存大小(当ecx = 1时,我得到22即512K,但正确的值是6MB)。此外,我的L2缓存大小(43(当ecx = 2时)和21(当ecx = 0时)似乎存在一些冲突)

2 个解决方案

#1


1  

So, your data seems to be reasonably correct, just that you are using an old reference. Unfortunately, Intel's website is either broken presently or it doesn't like Firefox and/or Linux.

因此,您的数据似乎是合理正确的,只是您使用的是旧参考。不幸的是,英特尔的网站目前要么已经破解,要么就像Firefox和/或Linux那样。

76036301

76036301

76 means trace cache with 64K ops.

76表示具有64K操作的跟踪缓存。

03 means 4 way DATA TLB with 64 entries.

03表示具有64个条目的4路DATA TLB。

63 is 32KB L1 cache - the source here shows that value, which is not in your docs.

63是32KB L1缓存 - 这里的源显示该值,这不在您的文档中。

01 means 4 way Instruction TLB with 32 entries.

01表示具有32个条目的4路指令TLB。

00f0b5ff gives

00f0b5ff给出

00 "nothing"

00“没什么”

f0 prefetch, 64 entries.

f0预取,64个条目。

0b Instruction 4 way TLB for large pages, 4 entries.

0b指令4路TLB用于大页面,4个条目。

b5 is not documented even on that link. [guessing small data TLB]

即使在该链接上也没有记录b5。 [猜测小数据TLB]

To get L2 and L3 cache sizes, you need to use CPUID with EAX=4, and set ECX to 0, 1, 2, ... for each caching level. The linked code shows this, and Intel's docs have details on which bits mean what.

要获得L2和L3高速缓存大小,您需要使用EID = 4的CPUID,并为每个高速缓存级别将ECX设置为0,1,2,....链接的代码显示了这一点,英特尔的文档详细说明了哪些位意味着什么。

#2


1  

Intel's Instruction Set Reference has all the relevant information you need (at around page 263), and is actually up to date unlike every other source I have found.

英特尔的指令集参考包含您需要的所有相关信息(在第263页左右),并且实际上是最新的,与我找到的其他所有来源不同。

Probably the best way to get the cache info is mentioned in that reference.

在该引用中可能提到了获取缓存信息的最佳方法。

When eax = 4 and ecx is the cache level,

当eax = 4且ecx是缓存级别时,

Ways = EBX[31:22]

Partitions = EBX[21:12]

LineSize = EBX[11:0]

Sets = ECX

Total Size = (Ways + 1) * (Partitions + 1) * (Line_Size + 1) * (Sets + 1)

So when CUPID is called with eax = 4 and ecx = 3, you can get your L3 cache size by doing the computation above. Using the OP's posted data:

因此,当使用eax = 4和ecx = 3调用CUPID时,您可以通过执行上面的计算来获得L3缓存大小。使用OP的发布数据:

ebx: 02c0003f
ecx: 00001fff

Ways = 63
Partitions = 0 
LineSize = 11
Sets = 8191

Total L3 cache size = 6291456

总L3缓存大小= 6291456

Which is what was expected.

这是预期的。

#1


1  

So, your data seems to be reasonably correct, just that you are using an old reference. Unfortunately, Intel's website is either broken presently or it doesn't like Firefox and/or Linux.

因此,您的数据似乎是合理正确的,只是您使用的是旧参考。不幸的是,英特尔的网站目前要么已经破解,要么就像Firefox和/或Linux那样。

76036301

76036301

76 means trace cache with 64K ops.

76表示具有64K操作的跟踪缓存。

03 means 4 way DATA TLB with 64 entries.

03表示具有64个条目的4路DATA TLB。

63 is 32KB L1 cache - the source here shows that value, which is not in your docs.

63是32KB L1缓存 - 这里的源显示该值,这不在您的文档中。

01 means 4 way Instruction TLB with 32 entries.

01表示具有32个条目的4路指令TLB。

00f0b5ff gives

00f0b5ff给出

00 "nothing"

00“没什么”

f0 prefetch, 64 entries.

f0预取,64个条目。

0b Instruction 4 way TLB for large pages, 4 entries.

0b指令4路TLB用于大页面,4个条目。

b5 is not documented even on that link. [guessing small data TLB]

即使在该链接上也没有记录b5。 [猜测小数据TLB]

To get L2 and L3 cache sizes, you need to use CPUID with EAX=4, and set ECX to 0, 1, 2, ... for each caching level. The linked code shows this, and Intel's docs have details on which bits mean what.

要获得L2和L3高速缓存大小,您需要使用EID = 4的CPUID,并为每个高速缓存级别将ECX设置为0,1,2,....链接的代码显示了这一点,英特尔的文档详细说明了哪些位意味着什么。

#2


1  

Intel's Instruction Set Reference has all the relevant information you need (at around page 263), and is actually up to date unlike every other source I have found.

英特尔的指令集参考包含您需要的所有相关信息(在第263页左右),并且实际上是最新的,与我找到的其他所有来源不同。

Probably the best way to get the cache info is mentioned in that reference.

在该引用中可能提到了获取缓存信息的最佳方法。

When eax = 4 and ecx is the cache level,

当eax = 4且ecx是缓存级别时,

Ways = EBX[31:22]

Partitions = EBX[21:12]

LineSize = EBX[11:0]

Sets = ECX

Total Size = (Ways + 1) * (Partitions + 1) * (Line_Size + 1) * (Sets + 1)

So when CUPID is called with eax = 4 and ecx = 3, you can get your L3 cache size by doing the computation above. Using the OP's posted data:

因此,当使用eax = 4和ecx = 3调用CUPID时,您可以通过执行上面的计算来获得L3缓存大小。使用OP的发布数据:

ebx: 02c0003f
ecx: 00001fff

Ways = 63
Partitions = 0 
LineSize = 11
Sets = 8191

Total L3 cache size = 6291456

总L3缓存大小= 6291456

Which is what was expected.

这是预期的。