linux驱动之input子系统及分层/分离设计bug调试

时间:2022-10-11 23:37:25

上个实例:http://www.cnblogs.com/weishengzhong/p/7429840.html

之前那个实例在运行过程中有个bug,将驱动模块装入内核后,不做任何操作,等待一段时间,就会出现内核错误,具体打印信息如下:

Unable to handle kernel NULL pointer dereference at virtual address 00000004
pgd = c0104000
[00000004] *pgd=00000000
Internal error: Oops: 17 [#1] ARM
Modules linked in: buttondev(O) buttondrv(O)
CPU: 0 Tainted: G O (3.4.2 #13)
PC is at buttons_timer_function+0xc/0x68 [buttondrv]
LR is at run_timer_softirq+0x10c/0x244
pc : [<bf0000a0>] lr : [<c01322a4>] psr: 80000013
sp : c0589ec8 ip : bf000494 fp : c05d072c
r10: c05d032c r9 : c05d052c r8 : c0588000
r7 : bf000094 r6 : c0589ee8 r5 : bf000494 r4 : 00000000
r3 : 80000013 r2 : 00000000 r1 : 60000093 r0 : 00000000
Flags: Nzcv IRQs on FIQs on Mode SVC_32 ISA ARM Segment kernel
Control: c000717f Table: 33ab8000 DAC: 00000017
Process swapper (pid: 0, stack limit = 0xc0588270)
Stack: (0xc0589ec8 to 0xc058a000)
9ec0: c05cfb20 00000100 c0589ee8 c01322a4 c05a2a9c c059cc08
9ee0: c05a896c c05d092c c0589ee8 c0589ee8 00000001 00000004 00000001 00000001
9f00: 00000100 c05cf9c0 0000000a c0588000 c05a1754 c012d524 c05a896c 00000000
9f20: c0589f94 0000001e c05a896c 00000000 c0589f94 c0580520 41129200 c0590020
9f40: 00000000 c012d748 0000001e c01163a0 c01164d4 60000013 f6000000 c01150a4
9f60: f6100000 00000032 f6100000 60000013 c0588000 c05beb68 c05932ac c05beac0
9f80: c0580520 41129200 c0590020 00000000 00000000 c0589fa8 c01164c8 c01164d4
9fa0: 60000013 ffffffff c0588000 c0116b68 c0590170 c05beb40 c058137c c055e868
9fc0: 00000000 00000000 c055e3d4 00000000 00000000 c058137c 00000000 c0007175
9fe0: c0590094 c0581378 c05932a4 30104000 3057f878 30108040 00000000 00000000
[<bf0000a0>] (buttons_timer_function+0xc/0x68 [buttondrv]) from [<c01322a4>] (run_timer_softirq+0x10c/0x244)
[<c01322a4>] (run_timer_softirq+0x10c/0x244) from [<c012d524>] (__do_softirq+0x88/0x148)
[<c012d524>] (__do_softirq+0x88/0x148) from [<c012d748>] (irq_exit+0x48/0x50)
[<c012d748>] (irq_exit+0x48/0x50) from [<c01163a0>] (handle_IRQ+0x34/0x84)
[<c01163a0>] (handle_IRQ+0x34/0x84) from [<c01150a4>] (__irq_svc+0x24/0xa0)
[<c01150a4>] (__irq_svc+0x24/0xa0) from [<c01164d4>] (default_idle+0x28/0x50)
[<c01164d4>] (default_idle+0x28/0x50) from [<c0116b68>] (cpu_idle+0x94/0xbc)
[<c0116b68>] (cpu_idle+0x94/0xbc) from [<c055e868>] (start_kernel+0x260/0x2f4)
Code: bf000494 e92d4070 e59f5058 e5954020 (e5940004) 
---[ end trace c5ecb8c491baf7b3 ]---
Kernel panic - not syncing: Fatal exception in interrupt

 

大致的看看信息,可以知道错误发生在 PC is at buttons_timer_function+0xc/0x68 [buttondrv]这个地方,用 cat /proc/kallsyms命令可以看出分别是加载的模块地址空间和内核函数地址空间,buttons_timer_function加载到内核空间的地址是:bf000094

看看出错时,各个寄存器保存的值是多少:pc : [<bf0000a0>] lr : [<c01322a4>] ,可以看出,出错时PC当前地址是加载的内核模块中bf0000a0地方,从内核函数c01322a4这个位置调用它的时候出错了;为什么调用bf0000a0所在的函数会出出错呢?

对buttondrv.ko文件进行反汇编,arm-linux-objdump -D buttondrv.ko >buttondrv.dis ;看如下反汇编代码:

00000094 <buttons_timer_function>:
  94:    e92d4070     push    {r4, r5, r6, lr}
  98:    e59f5058     ldr    r5, [pc, #88]    ; f8 <buttons_timer_function+0x64>
  9c:    e5954020     ldr    r4, [r5, #32]
  a0:    e5940004     ldr    r0, [r4, #4]
  a4:    ebfffffe     bl    0 <s3c2410_gpio_getpin>
  a8:    e2506000     subs    r6, r0, #0    ; 0x0
  ac:    1a00000a     bne    dc <buttons_timer_function+0x48>
  b0:    e3a01001     mov    r1, #1    ; 0x1
  b4:    e1a03001     mov    r3, r1
  b8:    e5942000     ldr    r2, [r4]
  bc:    e595001c     ldr    r0, [r5, #28]
  c0:    ebfffffe     bl    0 <input_event>
  c4:    e1a01006     mov    r1, r6
  c8:    e595001c     ldr    r0, [r5, #28]
  cc:    e1a02001     mov    r2, r1
  d0:    e1a03001     mov    r3, r1
  d4:    e8bd4070     pop    {r4, r5, r6, lr}
  d8:    eafffffe     b    0 <input_event>
  dc:    e3a01001     mov    r1, #1    ; 0x1
  e0:    e5942000     ldr    r2, [r4]
  e4:    e595001c     ldr    r0, [r5, #28]
  e8:    e3a03000     mov    r3, #0    ; 0x0
  ec:    ebfffffe     bl    0 <input_event>
  f0:    e3a01000     mov    r1, #0    ; 0x0
  f4:    eafffff3     b    c8 <buttons_timer_function+0x34>
  f8:    00000000     .word    0x00000000

buttons_timer_function加载到内核空间的地址是:bf000094,而对应反汇编的地址是0x94这个位置,出错的地方当然就是0xa0那个位置了;

看看a4位置,是个调用s3c2410_gpio_getpin函数的过程,看看buttons_timer_function函数的C语言代码:

static void buttons_timer_function ( unsigned long data )
{
    struct gpio_keys_button *buttonkey = ( struct gpio_keys_button * ) irq_pd;
    u32 pinval;

    pinval = s3c2410_gpio_getpin ( buttonkey->gpio );

    if ( pinval )
    {
        input_event ( iputdev, EV_KEY, buttonkey->code, 0 );
        input_sync ( iputdev );
    }
    else
    {
        input_event ( iputdev, EV_KEY, buttonkey->code, 1 );
        input_sync ( iputdev );
    }
}

 

s3c2410_gpio_getpin调用语句是  

pinval = s3c2410_gpio_getpin ( buttonkey->gpio );

这条语句,参数是buttonkey->gpio只有一个参数,正好是汇编的r0寄存器,我们再看看汇编r0寄存器赋了什么值给他:
a0:    e5940004     ldr    r0, [r4, #4]

这条汇编指令就是赋值r0语句,它是把r4寄存器的内容+4后赋给r0,看看出错时r4中的内容是0;而r0寄存器中存储的也是0,导致在零地址赋值0x04,正好是系统提示的
Unable to handle kernel NULL pointer dereference at virtual address 00000004
为什么会有这个错误呢?那就只能是参数有问题了!参数是个gpio_keys_button类型的指针指向的引脚,参考指针操作原则,这个指针可能是个空指针,在操作空指针的时候导致错误,要避免操作空指针的错误,只能是在操作之前先判断是不是空指针,

所以在调用函数s3c2410_gpio_getpin之前先判断指针是不是空 ,加入以下语句:
if ( !buttonkey )
return;

 

从新编译,然后装载模块,问题不再出现。