suse 奇怪的crash 问题

时间:2023-03-08 22:17:32
suse 奇怪的crash 问题

最近遇到一个suse的crash 问题:

我没有使用 mkswap /dev/磁盘路径 来制作swap分区,我有很多剩余内存,我设置nr_swapfiles为0,可是我还是遇到了关于swap的crash。

We got more kernel crashes at swapin_readahead() on our kernel:
[ 948.894273] BUG: unable to handle kernel NULL pointer dereference at 000000000000000c
[ 948.894283] IP: [<ffffffff81133662>] valid_swaphandles+0x72/0x150
[ 948.894292] PGD 3e4b007067 PUD 3e4b008067 PMD
[ 948.894296] Oops: [#] SMP
[ 948.894301] CPU
[ 948.894302] Modules linked in: xfs w83627dhg(EN) af_packet tipc(EX) ossmod(EN) iptable_filter ip_tables x_tables bonding edd cpufreq_conservative cpufreq_userspace cpufreq_po
wersave acpi_cpufreq mperf binfmt_misc fuse loop dm_mod ipv6 ipv6_lib pcspkr i40e(EX) igb ses enclosure dca iTCO_wdt sg ptp pps_core i2c_i801 iTCO_vendor_support mei mptctl mptb
ase rtc_cmos button acpi_power_meter container ext3 jbd mbcache usbhid hid ttm drm_kms_helper drm i2c_algo_bit sysimgblt sysfillrect i2c_core syscopyarea ehci_hcd usbcore usb_co
mmon sd_mod crc_t10dif processor thermal_sys hwmon scsi_dh_rdac scsi_dh_emc scsi_dh_hp_sw scsi_dh_alua scsi_dh ahci libahci libata mpt3sas(EX) configfs scsi_transport_sas raid_c
lass scsi_mod [last unloaded: witdriver]
[ 948.894346] Supported: No, Unsupported modules are loaded
[ 948.894348]
[ 948.894350] Pid: , comm: nginx Tainted: G ENX 3.0.-0.47.-default # ZTE Grantley/S1008
[ 948.894354] RIP: :[<ffffffff81133662>] [<ffffffff81133662>] valid_swaphandles+0x72/0x150
[ 948.894357] RSP: :ffff883e4b021ca8 EFLAGS:
[ 948.894359] RAX: RBX: 00181818182f98a0 RCX:
[ 948.894362] RDX: 000000000000408e RSI: RDI: ffffffff81e56ce0
[ 948.894364] RBP: R08: R09:
[ 948.894366] R10: ffff883e4b011218 R11: ffff883e4dac4bc0 R12: 00181818182f9898
[ 948.894368] R13: 00181818182f989a R14: R15: ffff883e4b021d30
[ 948.894371] FS: 00007f1c66ca7720() GS:ffff88407dda0000() knlGS:
[ 948.894373] CS: DS: ES: CR0: 000000008005003b
[ 948.894375] CR2: 000000000000000c CR3: 0000003e4b006000 CR4: 00000000001407e0
[ 948.894377] DR0: DR1: DR2:
[ 948.894380] DR3: DR6: 00000000ffff0ff0 DR7:
[ 948.894382] Process nginx (pid: , threadinfo ffff883e4b020000, task ffff883e4b01e540)
[ 948.894384] Stack:
[ 948.894385] ffff883e4b021cc4 30181818182f989a 00007f1ac86072fc
[ 948.894391] ffff883f89f0c038 ffff883fbe860d78 00000000000200da ffffffff81132bd6
[ 948.894397] 30181818182f989a
[ 948.894401] Call Trace:
[ 948.894411] [<ffffffff81132bd6>] swapin_readahead+0x26/0xd0
[ 948.894417] [<ffffffff81122a8d>] do_swap_page+0xed/0x5f0
[ 948.894422] [<ffffffff81123ab1>] handle_pte_fault+0x1e1/0x230
[ 948.894429] [<ffffffff8146873d>] do_page_fault+0x1fd/0x4c0
[ 948.894434] [<ffffffff81465345>] page_fault+0x25/0x30
[ 948.894440] [<00007f1c65877dab>] 0x7f1c65877daa
[ 948.894441] Code: ff ff ff ff c5 4c eb d3 eb d3 e3 db 4c 0f e3 e8 cc b8 e9 d3 e0 c3
[ 948.894455] 8b 0c c3 c7 8d 0f fb f8
[ 948.894462] RIP [<ffffffff81133662>] valid_swaphandles+0x72/0x150
[ 948.894465] RSP <ffff883e4b021ca8>
[ 948.894467] CR2: 000000000000000c We also collect vmcore.
crash> dis -l valid_swaphandles+0x72
/home/chengry/linux-3.0.-0.47./mm/swapfile.c:
0xffffffff81133662 <valid_swaphandles+>: mov 0xc(%r14),%eax crash> dis -l valid_swaphandles
/home/chengry/linux-3.0.-0.47./mm/swapfile.c:
0xffffffff811335f0 <valid_swaphandles>: push %r15
0xffffffff811335f2 <valid_swaphandles+>: mov %rsi,%r15
/home/chengry/linux-3.0.-0.47./mm/swapfile.c:
0xffffffff811335f5 <valid_swaphandles+>: xor %esi,%esi
/home/chengry/linux-3.0.-0.47./mm/swapfile.c:
0xffffffff811335f7 <valid_swaphandles+>: push %r14
0xffffffff811335f9 <valid_swaphandles+>: push %r13
0xffffffff811335fb <valid_swaphandles+>: push %r12
0xffffffff811335fd <valid_swaphandles+>: push %rbp
0xffffffff811335fe <valid_swaphandles+>: push %rbx
0xffffffff811335ff <valid_swaphandles+>: sub $0x8,%rsp
/home/chengry/linux-3.0.-0.47./mm/swapfile.c:
0xffffffff81133603 <valid_swaphandles+>: mov 0xd233c3(%rip),%ebp # 0xffffffff81e569cc
/home/chengry/linux-3.0.-0.47./mm/swapfile.c:
0xffffffff81133609 <valid_swaphandles+>: test %ebp,%ebp
0xffffffff8113360b <valid_swaphandles+>: je 0xffffffff81133723 <valid_swaphandles+>
/home/chengry/linux-3.0.-0.47./mm/swapfile.c:
0xffffffff81133611 <valid_swaphandles+>: mov %rdi,%rax
/home/chengry/linux-3.0.-0.47./include/linux/swapops.h:
0xffffffff81133614 <valid_swaphandles+>: mov %rdi,%r13
/home/chengry/linux-3.0.-0.47./mm/swapfile.c:
0xffffffff81133617 <valid_swaphandles+>: mov %ebp,%ecx
/home/chengry/linux-3.0.-0.47./mm/swapfile.c:
0xffffffff81133619 <valid_swaphandles+>: shr $0x39,%rax
/home/chengry/linux-3.0.-0.47./include/linux/spinlock.h:
0xffffffff8113361d <valid_swaphandles+>: mov $0xffffffff81e56ce0,%rdi
/home/chengry/linux-3.0.-0.47./mm/swapfile.c:
0xffffffff81133624 <valid_swaphandles+>: mov $0x1,%r12d
/home/chengry/linux-3.0.-0.47./mm/swapfile.c:
0xffffffff8113362a <valid_swaphandles+>: mov -0x7e1a9300(,%rax,),%r14
/home/chengry/linux-3.0.-0.47./include/linux/swapops.h:
0xffffffff81133632 <valid_swaphandles+>: movabs $0x1ffffffffffffff,%rax
0xffffffff8113363c <valid_swaphandles+>: and %rax,%r13
/home/chengry/linux-3.0.-0.47./mm/swapfile.c:
0xffffffff8113363f <valid_swaphandles+>: mov %r13,%rbx
0xffffffff81133642 <valid_swaphandles+>: shr %cl,%rbx
0xffffffff81133645 <valid_swaphandles+>: shl %cl,%rbx
/home/chengry/linux-3.0.-0.47./mm/swapfile.c:
0xffffffff81133648 <valid_swaphandles+>: test %rbx,%rbx
0xffffffff8113364b <valid_swaphandles+>: cmovne %rbx,%r12
/home/chengry/linux-3.0.-0.47./include/linux/spinlock.h:
0xffffffff8113364f <valid_swaphandles+>: callq 0xffffffff81464c20 <_raw_spin_lock>
/home/chengry/linux-3.0.-0.47./mm/swapfile.c:
0xffffffff81133654 <valid_swaphandles+>: mov $0x1,%eax
0xffffffff81133659 <valid_swaphandles+>: mov %ebp,%ecx
0xffffffff8113365b <valid_swaphandles+>: shl %cl,%eax
0xffffffff8113365d <valid_swaphandles+>: cltq
0xffffffff8113365f <valid_swaphandles+>: add %rax,%rbx
0xffffffff81133662 <valid_swaphandles+>: mov 0xc(%r14),%eax
0xffffffff81133666 <valid_swaphandles+>: cmp %rax,%rbx
0xffffffff81133669 <valid_swaphandles+>: mov %rax,%rdi
/home/chengry/linux-3.0.-0.47./mm/swapfile.c:
0xffffffff8113366c <valid_swaphandles+>: lea 0x1(%r13),%rax
/home/chengry/linux-3.0.-0.47./mm/swapfile.c:
0xffffffff81133670 <valid_swaphandles+>: cmovbe %rbx,%rdi
/home/chengry/linux-3.0.-0.47./mm/swapfile.c:
0xffffffff81133674 <valid_swaphandles+>: cmp %rdi,%rax
0xffffffff81133677 <valid_swaphandles+>: jae 0xffffffff81133734 <valid_swaphandles+>
/home/chengry/linux-3.0.-0.47./mm/swapfile.c:
0xffffffff8113367d <valid_swaphandles+>: mov 0x10(%r14),%rcx
0xffffffff81133681 <valid_swaphandles+>: movzbl 0x1(%rcx,%r13,),%edx
0xffffffff81133687 <valid_swaphandles+>: test %dl,%dl
0xffffffff81133689 <valid_swaphandles+>: je 0xffffffff81133734 <valid_swaphandles+>
/home/chengry/linux-3.0.-0.47./mm/swapfile.c:
0xffffffff8113368f <valid_swaphandles+>: and $0xffffffbf,%edx
0xffffffff81133692 <valid_swaphandles+>: cmp $0x3f,%dl
0xffffffff81133695 <valid_swaphandles+>: je 0xffffffff81133734 <valid_swaphandles+>
0xffffffff8113369b <valid_swaphandles+>: add %r13,%rcx
0xffffffff8113369e <valid_swaphandles+>: xor %esi,%esi
0xffffffff811336a0 <valid_swaphandles+>: jmp 0xffffffff811336bc <valid_swaphandles+>
0xffffffff811336a2 <valid_swaphandles+>: nopw 0x0(%rax,%rax,)
/home/chengry/linux-3.0.-0.47./mm/swapfile.c:
0xffffffff811336a8 <valid_swaphandles+>: movzbl 0x2(%rcx),%edx
0xffffffff811336ac <valid_swaphandles+>: test %dl,%dl
0xffffffff811336ae <valid_swaphandles+>: je 0xffffffff811336c8 <valid_swaphandles+> and the source code should be:
int valid_swaphandles(swp_entry_t entry, unsigned long *offset)
{
struct swap_info_struct *si;
int our_page_cluster = page_cluster;
pgoff_t target, toff;
pgoff_t base, end;
int nr_pages = ; if (!our_page_cluster) /* no readahead */
return ; si = swap_info[swp_type(entry)];
target = swp_offset(entry);
base = (target >> our_page_cluster) << our_page_cluster;
end = base + ( << our_page_cluster);
if (!base) /* first page is swap header */
base++; spin_lock(&swap_lock);
if (frontswap_test(si, target)) {
spin_unlock(&swap_lock);
return ;
}
if (end > si->max) /* don't go beyond end of map */
end = si->max; it means that we get (swap_info_struct si ) is null。 and i get the related value :
crash> p swap_info
swap_info = $ =
{0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
0x0} crash> p swap_list
swap_list = $ = {
head = -,
next = -
} crash> p nr_swapfiles
nr_swapfiles = $ = crash> p total_swap_pages
total_swap_pages = $ =
crash> p least_priority
least_priority = $ = crash> p nr_swap_pages
nr_swap_pages = $ =
crash> p vm_swappiness
vm_swappiness = $ =
crash> p min_free_kbytes
min_free_kbytes = $ = crash> kmem -i
PAGES TOTAL PERCENTAGE
TOTAL MEM 251.5 GB ----
FREE 107.1 GB % of TOTAL MEM
USED 144.4 GB % of TOTAL MEM
SHARED 8.4 GB % of TOTAL MEM
BUFFERS 63.3 MB % of TOTAL MEM
CACHED 92.9 GB % of TOTAL MEM
SLAB 5.4 GB % of TOTAL MEM TOTAL SWAP ----
SWAP USED % of TOTAL SWAP
SWAP FREE % of TOTAL SWAP
#5 [ffff883e4b021bf0] page_fault at ffffffff81465345
[exception RIP: valid_swaphandles+114]
RIP: ffffffff81133662 RSP: ffff883e4b021ca8 RFLAGS: 00010216
RAX: 0000000000000008 RBX: 00181818182f98a0 RCX: 0000000000000003
RDX: 000000000000408e RSI: 0000000000000000 RDI: ffffffff81e56ce0
RBP: 0000000000000003 R8: 0000000000000000 R9: 0000000000000029
R10: ffff883e4b011218 R11: ffff883e4dac4bc0 R12: 00181818182f9898
R13: 00181818182f989a R14: 0000000000000000 R15: ffff883e4b021d30
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000
ffff883e4b021bf8: ffff883e4b021d30 0000000000000000
ffff883e4b021c08: 00181818182f989a 00181818182f9898
ffff883e4b021c18: 0000000000000003 00181818182f98a0
ffff883e4b021c28: ffff883e4dac4bc0 ffff883e4b011218
ffff883e4b021c38: 0000000000000029 0000000000000000
ffff883e4b021c48: 0000000000000008 0000000000000003
ffff883e4b021c58: 000000000000408e 0000000000000000
ffff883e4b021c68: ffffffff81e56ce0 ffffffffffffffff
ffff883e4b021c78: ffffffff81133662 0000000000000010
ffff883e4b021c88: 0000000000010216 ffff883e4b021ca8
ffff883e4b021c98: 0000000000000000 ffffffff81133654
ffff883e4b021ca8: ffff883e4b021cc4 30181818182f989a
ffff883e4b021cb8: 0000000000000000 00007f1ac86072fc
ffff883e4b021cc8: ffff883f89f0c038 ffff883fbe860d78
ffff883e4b021cd8: 00000000000200da ffffffff81132bd6
#6 [ffff883e4b021ce0] swapin_readahead at ffffffff81132bd6
ffff883e4b021ce8: 0000000000000000 30181818182f989a
ffff883e4b021cf8: 0000000000000000 0000000000000000
ffff883e4b021d08: 0000000000000000 ffffffff81257735
ffff883e4b021d18: 0000000000000000 ffffffff8146508e
ffff883e4b021d28: ffff88407ddb3a20 ffff881fcf625800
ffff883e4b021d38: ffffffff81103809 30181818182f989a
ffff883e4b021d48: 0000000000000000 0000000000000000
ffff883e4b021d58: ffff883f89f0c038 0000000000000029
ffff883e4b021d68: ffff883e4dac4bc0 ffffffff81122a8d
#7 [ffff883e4b021d70] do_swap_page at ffffffff81122a8d
ffff883e4b021d78: ffff883e4dac4bc0 ffff883e4b011218
ffff883e4b021d88: ffff883e4b011218 00007f1ac86072fc
ffff883e4b021d98: ffff883fbe860d78 ffff883f89f0c038
ffff883e4b021da8: ffffea00de62cab0 ffff883fbe860d78
ffff883e4b021db8: ffffea00de62cab0 303030305f313430
ffff883e4b021dc8: 0000000000000000 ffff883fbe860d78
ffff883e4b021dd8: ffff883f89f0c038 0000000000000029
ffff883e4b021de8: 00007f1ac86072fc ffffffff81123ab1
#8 [ffff883e4b021df0] handle_pte_fault at ffffffff81123ab1
ffff883e4b021df8: 303030305f313430 ffffffff81123b52 --303030305f313430 orig_pte should be
ffff883e4b021e08: 000000014b011218 ffff880000000358
ffff883e4b021e18: 00000029de62caa0 ffff883fbe860d78
ffff883e4b021e28: 00007f1ac86072fc 0000000000000006
ffff883e4b021e38: ffff883e4b021f58 0000000000000029
ffff883e4b021e48: ffff883e4dac4bc0 ffffffff8146873d
#9 [ffff883e4b021e50] do_page_fault at ffffffff8146873d
ffff883e4b021e58: ffff883e4dac4c20 ffff883e4b01e540
ffff883e4b021e68: ffff883e4b021fd8 ffff883e4b021fd8
ffff883e4b021e78: 0000000000010900 ffff883e4b01e540
ffff883e4b021e88: ffff883e4dd18480 ffff883e4b01e540
ffff883e4b021e98: ffffffff81055e30 dead000000100100
ffff883e4b021ea8: dead000000200200 ffffffff8146da6e
ffff883e4b021eb8: ffff883e4b021f70 ffffffff8146508e
ffff883e4b021ec8: 00000000000006fe ffffffff8146da6e
ffff883e4b021ed8: ffff883e4dac4bc0 ffff883e4b011218
ffff883e4b021ee8: 0000000000000029 0000000000000001
ffff883e4b021ef8: 0000000000000000 0000000400000063
ffff883e4b021f08: 0000000000000001 00007f1ac86072fc
we have a lots of free  memory,and i set the nr_swapfiles be zero when the machine start,although i know that nr_swapfiles be  is not mean we close the swap ,but i still be refused why the machine should be swap now ?
and why i get NULL pointer? i didn't set the swap partition enable。