intel:x86架构VT虚拟化(三):x64多核代码介绍

时间:2024-02-20 20:34:31

    一般而言,我们做windows内核和VT测试,都是在自己的物理机装vmware或virtualBox虚拟机,再在虚拟机装windows,然后在物理机装windbg链接到虚拟机,通过windbg调试虚拟机的windwos内核;如果是VT测试,就要开启虚拟机的VT,这就涉及到VT嵌套了,整体架构如下:

       

  L0 = Code that runs on a physical host. Runs a hypervisor;

  L1 = L0’s hypervisor guest. Runs the hypervisor we want to debug;

  L2 = L1’s hypervisor guest;

  从上面的图能看到用windbg既能调试guestOS,也能调试hostOS的代码;

 

  上一篇文章用了周壑的VT框架,其优点是代码简洁、框架逻辑明晰,适合初学入门;缺点是仅限于32位,无法跑在64位,并且还是单核;这次推荐另一个框架,github地址:https://github.com/zhuhuibeishadiao ,里面有miniVT64和PFHook两个工程,建议先从miniVT64入手,原因同样是逻辑简单,代码少,易入门;  

   1、第一次跑代码的时候就蓝屏报错,从windbg看到错误类型: 常见的C0000005,access violation,也就是内存无法访问;

     

  执行出错的代码:invvpid

     

 为了彻底了解出错原因并修复bug,这里简单介绍一下invvpid这个指令的作用,核心要点如下:

 (1)Intel的VPID(Virtual-Processor Identifier)是一个16位的域,每个TLB表项与一个VPID相关联,用于唯一标识一个VCPU;

   (2)当进行虚拟地址到物理地址转换的时候,只有一个TLB表项对应的VPID与当前正在运行的虚拟机的VCPU的VPID相同的时候,才可以用该TLB表项把虚拟地址转换为物理地址

   (3)利用VPID可以区分一个TLB表项属于哪个VCPU,从而在虚拟机切换的时候可以保留TLB中已经有的表项,减少了无用的TLB刷新;

   (4)invvpid指令第二个参数叫descriptor,结构如下:一共128bit,0-15就是VPID号,64-127是缓存的线性地址,可有效减少CPU转换地址时读内存的次数,提升程序运行效率;

    

     回到这个bug本身:函数有两个参数,分别是rcx和rdx。看了一下出错当时的上下文,发现rcx=2,意味着invalidate掉所有VPID(除了000H)对应的虚拟地址翻译;从access violation的提示看,应该是第二个descriptor参数出错了:这里访问了内存;

      

  回到windbg,把dq读取一下rax地址对应的内容,发现没任何问题;这就奇怪了:能读取到内存特定地址的内容,但是windbg又报access violation的错,这是怎么回事了?继续看https://www.felixcloutier.com/x86/invvpid 的指令介绍,发现一条重要信息:在访问内存时发生缺页会导致异常,这就能解释这条指令为什么执行失败了。

 

  执行invvpid时已经开启了VMX,此时已经进入hostOS。但目前的hostOS刚开始运行,什么代码都没有:VMCS还未设置,段寄存器、控制寄存器、GDT/IDT都没设置,属于”一穷二白“的阶段,此时若发生缺页异常,去哪找回缺失的页都不知道,只能宕机;所以invvpid的第二个参数必须要用非分页内存,确保不会被交换到磁盘;

   

   改进后的代码:分配一个128bit = 16byte的非分页内存,再作为descriptor传入:

       

  即使进入host,分配内存、转成物理地址(再直白一点:还要依靠guestOS维护的页表才能把虚拟地址转成物理地址)等都要依靠guestOS的API,host此时还只是个空架子;

  

 2、正当愉快地单步时,另一个问题接踵而至:出异常的代码时xsaves [rcx];

  

  出错时的调用堆栈:

     

  驱动里面的出错代码:

    

    这次的异常代码是在swapcontext,应该是在切换线程时出错的;老办法,先查查这个条指令的作用:https://www.felixcloutier.com/x86/xsaves

 “Performs a full or partial save of processor state components to the XSAVE area located at the memory address specified by the destination operand”: 就是保存处理器的各种状态到指令指定的内存模块;这里指定的内存在[rcx],先看看这块内存是不是读写出错了:从结果来看,这块内存区域是没问题的;

kd> dq ffffd40acb595cc0
ffffd40a`cb595cc0  00000000`00000000 00000000`00000000
ffffd40a`cb595cd0  00000000`00000000 00000000`00001f80
ffffd40a`cb595ce0  00000000`00000000 00000000`00000000
ffffd40a`cb595cf0  00000000`00000000 00000000`00000000
ffffd40a`cb595d00  00000000`00000000 00000000`00000000
ffffd40a`cb595d10  00000000`00000000 00000000`00000000
ffffd40a`cb595d20  00000000`00000000 00000000`00000000
ffffd40a`cb595d30  00000000`00000000 00000000`00000000
kd> r cr3
cr3=00000000001aa000
kd> !vtop 00000000001aa000 fffff800bf80734c
Amd64VtoP: Virt fffff800bf80734c, pagedir 00000000001aa000
Amd64VtoP: PML4E 00000000001aaf80
Amd64VtoP: PDPE 0000000001109010
Amd64VtoP: PDE 000000000110afe0
Amd64VtoP: PTE 0000000001095038
Amd64VtoP: Mapped phys 000000000220734c
Virtual address fffff800bf80734c translates to physical address 220734c.

  从日志看:是进入guestOS后才产生的异常,既然是这里产生的,很有可能是xsaves产生了vmexit,但是hostOS并未正常handle;

kd> g
FGP [VT] : [#0][IRQL=0x2](AdjustControls): Adjusting control for msr 0x481
FGP [VT] : [#0][IRQL=0x2](AdjustControls): Adjusting controls (low): 0x00000016
FGP [VT] : [#0][IRQL=0x2](AdjustControls): Adjusting controls (high): 0x0000003f
FGP [VT] : [#0][IRQL=0x2](AdjustControls): Adjusting control for msr 0x483
FGP [VT] : [#0][IRQL=0x2](AdjustControls): Adjusting controls (low): 0x00036dff
FGP [VT] : [#0][IRQL=0x2](AdjustControls): Adjusting controls (high): 0x003fffff
FGP [VT] : [#0][IRQL=0x2](AdjustControls): Adjusting control for msr 0x484
FGP [VT] : [#0][IRQL=0x2](AdjustControls): Adjusting controls (low): 0x000011ff
FGP [VT] : [#0][IRQL=0x2](AdjustControls): Adjusting controls (high): 0x0000f3ff
FGP [VT] : [#0][IRQL=0x2](Virtualize): CPU: 0xFFFF8C02E51DFF60 
FGP [VT] : [#0][IRQL=0x2](Virtualize): rsp: 0xffffd40acb595ac8 
FGP [VT] : [#0][IRQL=0x2](ResumeGuest): Resuming guest...

  继续看intel手册的说明,从 “Table 24-7. Definitions of Secondary Processor-Based VM-Execution Controls” 发现如下关键信息:

    

   如果第20位设置为0,任何执行xsaves的指令都会导致#UD(undefined);

        回到setupvmcs函数,vmwrite的时候把这位设置为1即可:

 

   3、继续运行时,又遇到bug,日志如下:

kd> g
FGP [VT] : [#0][IRQL=0x0](DriverEntry): Dirver is StartFGP [VT] : [#0][IRQL=0x0](DriverEntry): Dirver is Start
FGP [VT] : [#0][IRQL=0x0](VtStart): virtualizing 1 processors ...
FGP [VT] : [#0][IRQL=0x0](VtStart): Allocated g_cpus array @ 0xffff8c02e3f85370, size=0x8
FGP [VT] : [#0][IRQL=0x2](SetupVMX): VMXON region size: 0x0
FGP [VT] : [#0][IRQL=0x2](SetupVMX): VMX revision ID: 0x1
FGP [VT] : [#0][IRQL=0x2](SetupVMX): VMON 内存虚拟地址 ffffa20189ea0000
FGP [VT] : [#0][IRQL=0x2](SetupVMX): VMON 物理地址 7c0e8000
FGP [VT] : [#0][IRQL=0x2](SetupVMX): VMCS 内存虚拟地址 ffffa20189ea6000
FGP [VT] : [#0][IRQL=0x2](SetupVMX): VMCS 物理地址 7c085000
FGP [VT] : [#0][IRQL=0x2](SetupVMCS): GuestRsp=FFFFD40ACA21BB28
FGP [VT] : [#0][IRQL=0x2](SetupVMCS): VMCS PHYSICAL_ADDRESS 7c085000
FGP [VT] : [#0][IRQL=0x2](AdjustControls): Adjusting control for msr 0x481
FGP [VT] : [#0][IRQL=0x2](AdjustControls): Adjusting controls (low): 0x00000016
FGP [VT] : [#0][IRQL=0x2](AdjustControls): Adjusting controls (high): 0x0000003f
FGP [VT] : [#0][IRQL=0x2](AdjustControls): Adjusting control for msr 0x483
FGP [VT] : [#0][IRQL=0x2](AdjustControls): Adjusting controls (low): 0x00036dff
FGP [VT] : [#0][IRQL=0x2](AdjustControls): Adjusting controls (high): 0x003fffff
FGP [VT] : [#0][IRQL=0x2](AdjustControls): Adjusting control for msr 0x484
FGP [VT] : [#0][IRQL=0x2](AdjustControls): Adjusting controls (low): 0x000011ff
FGP [VT] : [#0][IRQL=0x2](AdjustControls): Adjusting controls (high): 0x0000f3ff
FGP [VT] : [#0][IRQL=0x2](Virtualize): CPU: 0xFFFF8C02E2CE7F60 
FGP [VT] : [#0][IRQL=0x2](Virtualize): rsp: 0xffffd40aca21bac8 
FGP [VT] : [#0][IRQL=0x2](HandleVmExit): Exit code: 16
FGP [VT] : [#0][IRQL=0x2](HandleRdtsc): vmx: HandleRdtsc(): rax = 0x0, rdx = 0x80000003
FGP [VT] : [#0][IRQL=0x2](HandleVmExit): Exit code: 28
FGP [VT] : [#0][IRQL=0x2](HandleCrAccess): HandleCrAccess: pExitQualification->ControlRegister = 3
FGP [VT] : [#0][IRQL=0x2](HandleVmExit): Exit code: 10
FGP [VT] : [#0][IRQL=0x2](HandleCpuid): vmx: HandleCpuid(): guest_rip = 0xfffff800bf62a4b4
FGP [VT] : [#0][IRQL=0x2](HandleVmExit): Exit code: 10
FGP [VT] : [#0][IRQL=0x2](HandleCpuid): vmx: HandleCpuid(): guest_rip = 0xfffff800bf63bb39
FGP [VT] : [#0][IRQL=0x2](HandleVmExit): Exit code: 10
FGP [VT] : [#0][IRQL=0x2](HandleCpuid): vmx: HandleCpuid(): guest_rip = 0xfffff800bf63bada
FGP [VT] : [#0][IRQL=0x2](HandleVmExit): Exit code: 31
FGP [VT] : [#0][IRQL=0x2](HandleMsrRead): vmx: HandleMsrRead(): msr = 0x1b, Msr.LowPart = 0xfee00d00, Msr.HighPart = 0x0
FGP [VT] : [#0][IRQL=0x2](HandleVmExit): Exit code: 31
FGP [VT] : [#0][IRQL=0x2](HandleMsrRead): vmx: HandleMsrRead(): msr = 0x1b, Msr.LowPart = 0xfee00d00, Msr.HighPart = 0x0
FGP [VT] : [#0][IRQL=0x2](HandleVmExit): Exit code: 31
FGP [VT] : [#0][IRQL=0x2](HandleMsrRead): vmx: HandleMsrRead(): msr = 0x1b, Msr.LowPart = 0xfee00d00, Msr.HighPart = 0x0
FGP [VT] : [#0][IRQL=0x2](HandleVmExit): Exit code: 31
FGP [VT] : [#0][IRQL=0x2](HandleMsrRead): vmx: HandleMsrRead(): msr = 0x40000105, Msr.LowPart = 0x0, Msr.HighPart = 0x80000000
FGP [VT] : [#0][IRQL=0x2](HandleVmExit): Exit code: 32
FGP [VT] : [#0][IRQL=0x2](HandleMsrWrite): vmx: HandleMsrWrite(): msr = 0x40000100, rax = 0x7f, rdx = 0x0
FGP [VT] : [#0][IRQL=0x2](HandleVmExit): Exit code: 32
FGP [VT] : [#0][IRQL=0x2](HandleMsrWrite): vmx: HandleMsrWrite(): msr = 0x40000101, rax = 0x8, rdx = 0x0
FGP [VT] : [#0][IRQL=0x2](HandleVmExit): Exit code: 32
FGP [VT] : [#0][IRQL=0x2](HandleMsrWrite): vmx: HandleMsrWrite(): msr = 0x40000102, rax = 0xc184de70, rdx = 0xfffff800
FGP [VT] : [#0][IRQL=0x2](HandleVmExit): Exit code: 32
FGP [VT] : [#0][IRQL=0x2](HandleMsrWrite): vmx: HandleMsrWrite(): msr = 0x40000103, rax = 0x10001f, rdx = 0x0
FGP [VT] : [#0][IRQL=0x2](HandleVmExit): Exit code: 32
FGP [VT] : [#0][IRQL=0x2](HandleMsrWrite): vmx: HandleMsrWrite(): msr = 0x40000104, rax = 0xbfe2bc98, rdx = 0xfffff800
FGP [VT] : [#0][IRQL=0x2](HandleVmExit): Exit code: 32
FGP [VT] : [#0][IRQL=0x2](HandleMsrWrite): vmx: HandleMsrWrite(): msr = 0x40000105, rax = 0x0, rdx = 0x80000000

  这次虚拟机卡死,点击鼠标没任何反应;wingbd显示running,但断不下来,感觉也是卡死状态;从最后一行日志看,guestOS正在往0x40000105号MAR寄存器写数据,遂google一番,找到了部分原因(https://git.qemu.org/?p=qemu.git;a=blob_plain;f=docs/hyperv.txt;hb=master):

  “write to HV_X64_MSR_CRASH_CTL causes guest to shutdown. This effectively blocks crash dump generation by Windows”

  往MSR的0x40000105寄存器写会导致guest shutdown,现在需要进一步排查是什么原因使得guestOS往MSR的0x40000105写数据!在逐行调试代码、对比其他VT框架后,终于发现了这份miniVT代码的坑:

  • 进入VMM后没有关中断,这时如果被打断,因为VMCS已经设置了HOST_IDTR_BASE(用的还是guestOS的中断向量),所以会跳转到终端历程这里,此时会打乱堆栈的平衡,导致栈上保存的寄存器值错乱;
  • 在栈中保存guestOS的寄存器上下文,rsp未正确保存;

       

   换成PFHook的写法后正常了;

   4、(1)多核关键代码:遍历每个核,每个核单独设置所需内存,不同核千万不能共享同一块保存数据的内存

NTSTATUS StartVirtualTechnology()
{
    Asm_int3();
    KeInitializeMutex(&g_GlobalMutex,0);//初始化互斥体
    KeWaitForMutexObject(&g_GlobalMutex,Executive,KernelMode,FALSE,0);
    g_Pml4 = EptInitialization();

    for (int i = 0;i<KeNumberProcessors;i++)
    {
        KeSetSystemAffinityThread((KAFFINITY)(1 << i));//指定哪个CPU运行当前线程的代码

        SetupVT(); // 设置VT,每个核单独分配VMXON和VMCS区域需要的内存,不同核千万不能共享同一块内存,否则蓝屏死机

        KeRevertToUserAffinityThread();//恢复到原来正在跑的线程
    }

    KeReleaseMutex(&g_GlobalMutex, FALSE);

    KdPrint(("VT Engine has been loaded!\n"));

    return STATUS_SUCCESS;
}

  (2)设置VMCS需要注意的点:vmlaunch后进入guestOS运行,但是这里的目的是调试,不需要额外运行任何代码直接回到下面的push EntryRflags,以guestOS的身份继续运行,驱动才能加载完成

            这里保存通用寄存器都没用栈,而是在数据段单独开辟的空间,避免了guestRSP被改动核破坏;

Asm_RunToVMCS Proc
    mov rax,[rsp]
    mov GuestReturn,rax ;获取返回地址,让vmlaunch后客户机继续执行驱动加载的代码,驱动才能加载完成
    
    call SetupVMCS    ;这个函数填充VMCS结构体,然后直接vmlaunch,随后继续回到Asm_SetupVMCS的push EntryRflags代码执行(这时已guestOS身份执行)
    ret
Asm_RunToVMCS Endp

Asm_SetupVMCS Proc        ;在SetupVT中最先被调用
    cli                    ;关中断,避免被打断产生函数调用,栈被破坏
    mov GuestRSP,rsp    ;vmlaunch后rsp从这里开始读数据
    
    mov EntryRAX,rax    ;设置VMCS结构体在函数中,会改变寄存器的值,这里先保存好。因为栈会变动,所以这里不用栈,而是在数据段保存
    mov EntryRCX,rcx
    mov EntryRDX,rdx
    mov EntryRBX,rbx
    mov EntryRSP,rsp
    mov EntryEBP,rbp
    mov EntryESI,rsi
    mov EntryRDI,rdi
    mov EntryR8,r8
    mov EntryR9,r9
    mov EntryR10,r10
    mov EntryR11,r11
    mov EntryR12,r12
    mov EntryR13,r13
    mov EntryR14,r14
    mov EntryR15,r15
    
    pushfq
    pop EntryRflags
    
    call Asm_RunToVMCS    ;从上面绕一圈,打个岔,目的是保存下一行代码的地址,vmlanuch后guest继续从这里开始执行
    
    push EntryRflags    ;看上面,这行代码的地址会赋给GuestReturn,vmlanuch后guest继续从这里开始执行
    popfq
    mov rax,EntryRAX    ;恢复寄存器的值
    mov rcx,EntryRCX
    mov rdx,EntryRDX
    mov rbx,EntryRBX
    mov rsp,EntryRSP
    mov rbp,EntryEBP
    mov rsi,EntryESI
    mov rdi,EntryRDI
    mov r8,EntryR8
    mov r9,EntryR9
    mov r10,EntryR10
    mov r11,EntryR11
    mov r12,EntryR12
    mov r13,EntryR13
    mov r14,EntryR14
    mov r15,EntryR15
    
    mov rsp,GuestRSP
    sti
    ret
Asm_SetupVMCS Endp

  (3)https://github.com/zhuhuibeishadiao 这里有完整的代码,建议先看看miniVT64,调试调试,熟悉代码框架和流程后继续调试PFHook

 

   经验总结:

   1、刚开始调试时建议把虚拟机改成单处理和单核,否则多核CPU同时运行,会执行不同的代码,调试时感觉到处跳跃,不按顺序执行。

   

   2、DbgPrint不要打印太多,比如在msr读写的时候打印,会造成日志刷屏,虚拟机卡死的假象(实际上windbg还能断下,说明并未死机)

参考:1、https://github.com/zhuhuibeishadiao  miniVT和PF_HOOk代码

      2、https://github.com/calware/HV-Playground  汇集了各个VT框架

      3、https://www.felixcloutier.com/x86/invvpid  invvpid指令介绍

      4、https://msrc-blog.microsoft.com/2018/12/10/first-steps-in-hyper-v-research/   First Steps in Hyper-V Research