centos 7.5 安装 kdump

时间:2022-09-15 16:52:47

先下载3个文件

其中内核对应uname -r的参数,相关安装包可以通过http://debuginfo.centos.org/7/x86_64/下载

kernel-debug-debuginfo-3.10.0-862.14.4.el7.x86_64.rpm
kernel-debuginfo-3.10.0-862.14.4.el7.x86_64.rpm
kernel-debuginfo-common-x86_64-3.10.0-862.14.4.el7.x86_64.rpm

crash 7.2.0-6.el7
Copyright (C) 2002-2017  Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010  IBM Corporation
Copyright (C) 1999-2006  Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012  Fujitsu Limited
Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011  NEC Corporation
Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions.  Enter "help copying" to see the conditions.
This program has absolutely no warranty.  Enter "help warranty" for details.

GNU gdb (GDB) 7.6
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu"...

WARNING: kernel relocated [858MB]: patching 82671 gdb minimal_symbol values

      KERNEL: /usr/lib/debug/lib/modules/3.10.0-862.14.4.el7.x86_64/vmlinux
    DUMPFILE: /var/crash/127.0.0.1-2018-11-03-22:12:29/vmcore  [PARTIAL DUMP]
        CPUS: 48
        DATE: Sat Nov  3 22:12:19 2018
      UPTIME: 1 days, 23:16:57
LOAD AVERAGE: 0.00, 0.07, 0.26
       TASKS: 1158
    NODENAME: localhost.localdomain
     RELEASE: 3.10.0-862.14.4.el7.x86_64
     VERSION: #1 SMP Wed Sep 26 15:12:11 UTC 2018
     MACHINE: x86_64  (2300 Mhz)
      MEMORY: 191.6 GB
       PANIC: "SysRq : Trigger a crash"
         PID: 230493
     COMMAND: "bash"
        TASK: ffff9fbee1f56eb0  [THREAD_INFO: ffff9fbf07700000]
         CPU: 5
       STATE: TASK_RUNNING (SYSRQ)

crash> bt
PID: 230493  TASK: ffff9fbee1f56eb0  CPU: 5   COMMAND: "bash"
 #0 [ffff9fbf07703ae8] machine_kexec at ffffffffb6a62a0a
 #1 [ffff9fbf07703b48] __crash_kexec at ffffffffb6b166c2
 #2 [ffff9fbf07703c18] crash_kexec at ffffffffb6b167b0
 #3 [ffff9fbf07703c30] oops_end at ffffffffb711d728
 #4 [ffff9fbf07703c58] no_context at ffffffffb710c84d
 #5 [ffff9fbf07703ca8] __bad_area_nosemaphore at ffffffffb710c8e4
 #6 [ffff9fbf07703cf8] bad_area_nosemaphore at ffffffffb710ca55
 #7 [ffff9fbf07703d08] __do_page_fault at ffffffffb71206e0
 #8 [ffff9fbf07703d70] do_page_fault at ffffffffb71208d5
 #9 [ffff9fbf07703da0] page_fault at ffffffffb711c758
    [exception RIP: sysrq_handle_crash+22]
    RIP: ffffffffb6e33b56  RSP: ffff9fbf07703e58  RFLAGS: 00010246
    RAX: ffffffffb6e33b40  RBX: ffffffffb76d7b20  RCX: 0000000000000000
    RDX: 0000000000000000  RSI: ffff9fbf18b53978  RDI: 0000000000000063
    RBP: ffff9fbf07703e58   R8: ffffffffb79c28bc   R9: ffffffffb79ff607
    R10: 0000000000000b37  R11: 0000000000000b36  R12: 0000000000000063
    R13: 0000000000000000  R14: 0000000000000004  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
#10 [ffff9fbf07703e60] __handle_sysrq at ffffffffb6e3430f
#11 [ffff9fbf07703e90] write_sysrq_trigger at ffffffffb6e347e8
#12 [ffff9fbf07703ea8] proc_reg_write at ffffffffb6c94e00
#13 [ffff9fbf07703ec8] vfs_write at ffffffffb6c1f240
#14 [ffff9fbf07703f08] sys_write at ffffffffb6c2006f
#15 [ffff9fbf07703f50] system_call_fastpath at ffffffffb712579b
    RIP: 00007f6eea301cd0  RSP: 00007ffceabd5e10  RFLAGS: 00010246
    RAX: 0000000000000001  RBX: 0000000000000002  RCX: 0000000000000000
    RDX: 0000000000000002  RSI: 00007f6eeac2c000  RDI: 0000000000000001
    RBP: 00007f6eeac2c000   R8: 000000000000000a   R9: 00007f6eeac12740
    R10: 00007f6eeac12740  R11: 0000000000000246  R12: 00007f6eea5d9400
    R13: 0000000000000002  R14: 0000000000000001  R15: 0000000000000000
    ORIG_RAX: 0000000000000001  CS: 0033  SS: 002b
crash> ps
   PID    PPID  CPU       TASK        ST  %MEM     VSZ    RSS  COMM
>     0      0   0  ffffffffb7616480  RU   0.0       0      0  [swapper/0]
>     0      0   1  ffff9fa85ac33f40  RU   0.0       0      0  [swapper/1]
>     0      0   2  ffff9fa85ac34f10  RU   0.0       0      0  [swapper/2]
>     0      0   3  ffff9fa85ac35ee0  RU   0.0       0      0  [swapper/3]
>     0      0   4  ffff9fa85ac36eb0  RU   0.0       0      0  [swapper/4]
      0      0   5  ffff9fa85ac60000  RU   0.0       0      0  [swapper/5]
>     0      0   6  ffff9fa85ac60fd0  RU   0.0       0      0  [swapper/6]
>     0      0   7  ffff9fa85ac61fa0  RU   0.0       0      0  [swapper/7]
>     0      0   8  ffff9fa85ac62f70  RU   0.0       0      0  [swapper/8]
>     0      0   9  ffff9fa85ac63f40  RU   0.0       0      0  [swapper/9]
>     0      0  10  ffff9fa85ac64f10  RU   0.0       0      0  [swapper/10]
>     0      0  11  ffff9fa85ac65ee0  RU   0.0       0      0  [swapper/11]
>     0      0  12  ffff9fbfdb3c0000  RU   0.0       0      0  [swapper/12]
>     0      0  13  ffff9fbfdb3c0fd0  RU   0.0       0      0  [swapper/13]
>     0      0  14  ffff9fbfdb3c1fa0  RU   0.0       0      0  [swapper/14]
>     0      0  15  ffff9fbfdb3c2f70  RU   0.0       0      0  [swapper/15]
>     0      0  16  ffff9fbfdb3c3f40  RU   0.0       0      0  [swapper/16]
>     0      0  17  ffff9fbfdb3c4f10  RU   0.0       0      0  [swapper/17]
>     0      0  18  ffff9fbfdb3c5ee0  RU   0.0       0      0  [swapper/18]
>     0      0  19  ffff9fbfdb3c6eb0  RU   0.0       0      0  [swapper/19]
>     0      0  20  ffff9fbfdb3f0000  RU   0.0       0      0  [swapper/20]
>     0      0  21  ffff9fbfdb3f0fd0  RU   0.0       0      0  [swapper/21]
>     0      0  22  ffff9fbfdb3f1fa0  RU   0.0       0      0  [swapper/22]
>     0      0  23  ffff9fbfdb3f2f70  RU   0.0       0      0  [swapper/23]
>     0      0  24  ffff9fa85ac66eb0  RU   0.0       0      0  [swapper/24]
>     0      0  25  ffff9fa85ac98000  RU   0.0       0      0  [swapper/25]
>     0      0  26  ffff9fa85ac98fd0  RU   0.0       0      0  [swapper/26]
>     0      0  27  ffff9fa85ac99fa0  RU   0.0       0      0  [swapper/27]
>     0      0  28  ffff9fa85ac9af70  RU   0.0       0      0  [swapper/28]
>     0      0  29  ffff9fa85ac9bf40  RU   0.0       0      0  [swapper/29]
>     0      0  30  ffff9fa85ac9cf10  RU   0.0       0      0  [swapper/30]
>     0      0  31  ffff9fa85ac9dee0  RU   0.0       0      0  [swapper/31]
>     0      0  32  ffff9fa85ac9eeb0  RU   0.0       0      0  [swapper/32]
>     0      0  33  ffff9fa85acc0000  RU   0.0       0      0  [swapper/33]
>     0      0  34  ffff9fa85acc0fd0  RU   0.0       0      0  [swapper/34]
>     0      0  35  ffff9fa85acc1fa0  RU   0.0       0      0  [swapper/35]
>     0      0  36  ffff9fbfdb3f6eb0  RU   0.0       0      0  [swapper/36]
>     0      0  37  ffff9fbfdb3f5ee0  RU   0.0       0      0  [swapper/37]
>     0      0  38  ffff9fbfdb3f4f10  RU   0.0       0      0  [swapper/38]
>     0      0  39  ffff9fbfdb3f3f40  RU   0.0       0      0  [swapper/39]
>     0      0  40  ffff9fbfdac18000  RU   0.0       0      0  [swapper/40]
>     0      0  41  ffff9fbfdac18fd0  RU   0.0       0      0  [swapper/41]
>     0      0  42  ffff9fbfdac19fa0  RU   0.0       0      0  [swapper/42]
>     0      0  43  ffff9fbfdac1af70  RU   0.0       0      0  [swapper/43]
>     0      0  44  ffff9fbfdac1bf40  RU   0.0       0      0  [swapper/44]
>     0      0  45  ffff9fbfdac1cf10  RU   0.0       0      0  [swapper/45]
crash>

  

 log命令很重要。很多故障都会丢到dmesg信息。一般宕机后。只有最新一次宕机的dmesg信息..

crash > log

得到宕机前dmesg信息如下。

 centos 7.5 安装 kdump

查看宕机时刻内存使用率

crash> kmem -i
                 PAGES        TOTAL      PERCENTAGE
    TOTAL MEM  49362433     188.3 GB         ----
         FREE  48239978       184 GB   97% of TOTAL MEM
         USED  1122455       4.3 GB    2% of TOTAL MEM
       SHARED   569792       2.2 GB    1% of TOTAL MEM
      BUFFERS        0            0    0% of TOTAL MEM
       CACHED   580597       2.2 GB    1% of TOTAL MEM
         SLAB    66681     260.5 MB    0% of TOTAL MEM

   TOTAL HUGE        0            0         ----
    HUGE FREE        0            0    0% of TOTAL HUGE

   TOTAL SWAP  1048575         4 GB         ----
    SWAP USED   416721       1.6 GB   39% of TOTAL SWAP
    SWAP FREE   631854       2.4 GB   60% of TOTAL SWAP

 COMMIT LIMIT  25729791      98.2 GB         ----
    COMMITTED  2367291         9 GB    9% of TOTAL LIMIT
crash>

  

 PS 显示宕机时刻,运行的进程。可以搭配grep检索

net 显示宕机时刻网络

bt 从最后往前看

 

PID: 230493  TASK: ffff9fbee1f56eb0  CPU: 5   COMMAND: "bash"
 #0 [ffff9fbf07703ae8] machine_kexec at ffffffffb6a62a0a
 #1 [ffff9fbf07703b48] __crash_kexec at ffffffffb6b166c2
 #2 [ffff9fbf07703c18] crash_kexec at ffffffffb6b167b0
 #3 [ffff9fbf07703c30] oops_end at ffffffffb711d728
 #4 [ffff9fbf07703c58] no_context at ffffffffb710c84d
 #5 [ffff9fbf07703ca8] __bad_area_nosemaphore at ffffffffb710c8e4
 #6 [ffff9fbf07703cf8] bad_area_nosemaphore at ffffffffb710ca55
 #7 [ffff9fbf07703d08] __do_page_fault at ffffffffb71206e0
 #8 [ffff9fbf07703d70] do_page_fault at ffffffffb71208d5
 #9 [ffff9fbf07703da0] page_fault at ffffffffb711c758
    [exception RIP: sysrq_handle_crash+22]
    RIP: ffffffffb6e33b56  RSP: ffff9fbf07703e58  RFLAGS: 00010246
    RAX: ffffffffb6e33b40  RBX: ffffffffb76d7b20  RCX: 0000000000000000
    RDX: 0000000000000000  RSI: ffff9fbf18b53978  RDI: 0000000000000063
    RBP: ffff9fbf07703e58   R8: ffffffffb79c28bc   R9: ffffffffb79ff607
    R10: 0000000000000b37  R11: 0000000000000b36  R12: 0000000000000063
    R13: 0000000000000000  R14: 0000000000000004  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
#10 [ffff9fbf07703e60] __handle_sysrq at ffffffffb6e3430f
#11 [ffff9fbf07703e90] write_sysrq_trigger at ffffffffb6e347e8
#12 [ffff9fbf07703ea8] proc_reg_write at ffffffffb6c94e00
#13 [ffff9fbf07703ec8] vfs_write at ffffffffb6c1f240
#14 [ffff9fbf07703f08] sys_write at ffffffffb6c2006f
#15 [ffff9fbf07703f50] system_call_fastpath at ffffffffb712579b
    RIP: 00007f6eea301cd0  RSP: 00007ffceabd5e10  RFLAGS: 00010246
    RAX: 0000000000000001  RBX: 0000000000000002  RCX: 0000000000000000
    RDX: 0000000000000002  RSI: 00007f6eeac2c000  RDI: 0000000000000001
    RBP: 00007f6eeac2c000   R8: 000000000000000a   R9: 00007f6eeac12740
    R10: 00007f6eeac12740  R11: 0000000000000246  R12: 00007f6eea5d9400
    R13: 0000000000000002  R14: 0000000000000001  R15: 0000000000000000
    ORIG_RAX: 0000000000000001  CS: 0033  SS: 002b

  sys命令可以查看到宕机时间

crash> sys
      KERNEL: /usr/lib/debug/lib/modules/3.10.0-862.14.4.el7.x86_64/vmlinux
    DUMPFILE: /var/crash/127.0.0.1-2018-11-03-22:12:29/vmcore  [PARTIAL DUMP]
        CPUS: 48
        DATE: Sat Nov  3 22:12:19 2018
      UPTIME: 1 days, 23:16:57
LOAD AVERAGE: 0.00, 0.07, 0.26
       TASKS: 1158
    NODENAME: localhost.localdomain
     RELEASE: 3.10.0-862.14.4.el7.x86_64
     VERSION: #1 SMP Wed Sep 26 15:12:11 UTC 2018
     MACHINE: x86_64  (2300 Mhz)
      MEMORY: 191.6 GB
       PANIC: "SysRq : Trigger a crash"
crash>

  

 

 查看宕机时刻执行的jobs

 

crash> ps | grep '>'
>     0      0   0  ffffffffb7616480  RU   0.0       0      0  [swapper/0]
>     0      0   1  ffff9fa85ac33f40  RU   0.0       0      0  [swapper/1]
>     0      0   2  ffff9fa85ac34f10  RU   0.0       0      0  [swapper/2]
>     0      0   3  ffff9fa85ac35ee0  RU   0.0       0      0  [swapper/3]
>     0      0   4  ffff9fa85ac36eb0  RU   0.0       0      0  [swapper/4]
>     0      0   6  ffff9fa85ac60fd0  RU   0.0       0      0  [swapper/6]
>     0      0   7  ffff9fa85ac61fa0  RU   0.0       0      0  [swapper/7]
>     0      0   8  ffff9fa85ac62f70  RU   0.0       0      0  [swapper/8]
>     0      0   9  ffff9fa85ac63f40  RU   0.0       0      0  [swapper/9]
>     0      0  10  ffff9fa85ac64f10  RU   0.0       0      0  [swapper/10]
>     0      0  11  ffff9fa85ac65ee0  RU   0.0       0      0  [swapper/11]
>     0      0  12  ffff9fbfdb3c0000  RU   0.0       0      0  [swapper/12]
>     0      0  13  ffff9fbfdb3c0fd0  RU   0.0       0      0  [swapper/13]
>     0      0  14  ffff9fbfdb3c1fa0  RU   0.0       0      0  [swapper/14]
>     0      0  15  ffff9fbfdb3c2f70  RU   0.0       0      0  [swapper/15]
>     0      0  16  ffff9fbfdb3c3f40  RU   0.0       0      0  [swapper/16]
>     0      0  17  ffff9fbfdb3c4f10  RU   0.0       0      0  [swapper/17]
>     0      0  18  ffff9fbfdb3c5ee0  RU   0.0       0      0  [swapper/18]
>     0      0  19  ffff9fbfdb3c6eb0  RU   0.0       0      0  [swapper/19]
>     0      0  20  ffff9fbfdb3f0000  RU   0.0       0      0  [swapper/20]
>     0      0  21  ffff9fbfdb3f0fd0  RU   0.0       0      0  [swapper/21]
>     0      0  22  ffff9fbfdb3f1fa0  RU   0.0       0      0  [swapper/22]
>     0      0  23  ffff9fbfdb3f2f70  RU   0.0       0      0  [swapper/23]
>     0      0  24  ffff9fa85ac66eb0  RU   0.0       0      0  [swapper/24]
>     0      0  25  ffff9fa85ac98000  RU   0.0       0      0  [swapper/25]
>     0      0  26  ffff9fa85ac98fd0  RU   0.0       0      0  [swapper/26]
>     0      0  27  ffff9fa85ac99fa0  RU   0.0       0      0  [swapper/27]
>     0      0  28  ffff9fa85ac9af70  RU   0.0       0      0  [swapper/28]
>     0      0  29  ffff9fa85ac9bf40  RU   0.0       0      0  [swapper/29]
>     0      0  30  ffff9fa85ac9cf10  RU   0.0       0      0  [swapper/30]
>     0      0  31  ffff9fa85ac9dee0  RU   0.0       0      0  [swapper/31]
>     0      0  32  ffff9fa85ac9eeb0  RU   0.0       0      0  [swapper/32]
>     0      0  33  ffff9fa85acc0000  RU   0.0       0      0  [swapper/33]
>     0      0  34  ffff9fa85acc0fd0  RU   0.0       0      0  [swapper/34]
>     0      0  35  ffff9fa85acc1fa0  RU   0.0       0      0  [swapper/35]
>     0      0  36  ffff9fbfdb3f6eb0  RU   0.0       0      0  [swapper/36]
>     0      0  37  ffff9fbfdb3f5ee0  RU   0.0       0      0  [swapper/37]
>     0      0  38  ffff9fbfdb3f4f10  RU   0.0       0      0  [swapper/38]
>     0      0  39  ffff9fbfdb3f3f40  RU   0.0       0      0  [swapper/39]
>     0      0  40  ffff9fbfdac18000  RU   0.0       0      0  [swapper/40]
>     0      0  41  ffff9fbfdac18fd0  RU   0.0       0      0  [swapper/41]
>     0      0  42  ffff9fbfdac19fa0  RU   0.0       0      0  [swapper/42]
>     0      0  43  ffff9fbfdac1af70  RU   0.0       0      0  [swapper/43]
>     0      0  44  ffff9fbfdac1bf40  RU   0.0       0      0  [swapper/44]
>     0      0  45  ffff9fbfdac1cf10  RU   0.0       0      0  [swapper/45]
>     0      0  46  ffff9fbfdac1dee0  RU   0.0       0      0  [swapper/46]
>     0      0  47  ffff9fbfdac1eeb0  RU   0.0       0      0  [swapper/47]
> 230493  230480   5  ffff9fbee1f56eb0  RU   0.0  116760   3568  bash
crash>

  

 查看宕机时刻占用内存最高的程序

[root@localhost home]# cat ps.txt | sed "s/^>//" | sort -n -k 7 |tail -n 20
   8641   7507  20  ffff9fbefa629fa0  IN   0.1 6954692 161904  JS Helper
   8642   7507  16  ffff9fbefa62af70  IN   0.1 6954692 161904  JS Helper
   8643   7507  21  ffff9fbefa62bf40  IN   0.1 6954692 161904  JS Helper
   8644   7507  37  ffff9fbefa62cf10  IN   0.1 6954692 161904  JS Helper
   8733   7507  37  ffff9fbefb6c4f10  IN   0.1 6954692 161904  llvmpipe-0
   8734   7507  37  ffff9fbefb6c3f40  IN   0.1 6954692 161904  llvmpipe-1
   8736   7507  32  ffff9fbefb6c5ee0  IN   0.1 6954692 161904  llvmpipe-2
   8737   7507  33  ffff9fbefe250000  IN   0.1 6954692 161904  llvmpipe-3
   8738   7507  38  ffff9fbefe250fd0  IN   0.1 6954692 161904  llvmpipe-4
   8739   7507  32  ffff9fbefe251fa0  IN   0.1 6954692 161904  llvmpipe-5
   8740   7507  32  ffff9fbefe252f70  IN   0.1 6954692 161904  llvmpipe-6
   8741   7507  32  ffff9fbefe253f40  IN   0.1 6954692 161904  llvmpipe-7
   8742   7507  32  ffff9fbefe254f10  IN   0.1 6954692 161904  llvmpipe-8
   8743   7507  32  ffff9fbefe255ee0  IN   0.1 6954692 161904  llvmpipe-9
   8744   7507  32  ffff9fbefe256eb0  IN   0.1 6954692 161904  llvmpipe-10
   8745   7507  32  ffff9fa8591f0000  IN   0.1 6954692 161904  llvmpipe-11
   8746   7507  32  ffff9fa8591f0fd0  IN   0.1 6954692 161904  llvmpipe-12
   8747   7507  32  ffff9fa8591f1fa0  IN   0.1 6954692 161904  llvmpipe-13
   8748   7507  32  ffff9fa8591f2f70  IN   0.1 6954692 161904  llvmpipe-14
   8749   7507  32  ffff9fa8591f3f40  IN   0.1 6954692 161904  llvmpipe-15
[root@localhost home]#

  

 

---恢复内容结束---