A case for coredump under HP-UX B.11.31 U ia64 + oracle 10.2

时间:2023-02-06 12:22:36

A pro*c based application can run correctly under AIX and either oracle9i or  oracle 10g, and it is also ok for HPUX and oracle9i. But it always core dump when connecting to the database for HPUX and oracle10g.

 

Information in core file is shown as follows:

 

Program received signal SIGBUS, Bus error
  si_code: 1 - BUS_ADRALN - Invalid address alignment. Please refer to the following link that helps in handling unaligned data: http://docs.hp.com/en/7730/newhelp0610/pragmas.htm#pragma-pack-ex3.
warning: Load module /oracle/product/102/lib/libclntsh.so.10.1 has been stripped. 
Debugging information is not available.


warning: Load module /oracle/product/102/lib/libnnz10.so has been stripped. 
Debugging information is not available.

#0  0xc0000000004c6dd0:0 in getsockopt+0xb0 ()
   from /usr/lib/hpux64/libxnet.so.1
(gdb) where
#0  0xc0000000004c6dd0:0 in getsockopt+0xb0 ()
   from /usr/lib/hpux64/libxnet.so.1
#1  0xc000000008ea0440:0 in <unknown_procedure> + 0x750 ()
   from /oracle/product/102/lib/libclntsh.so.10.1
#2  0xc000000008c3d420:0 in ntconn+0x1e0 ()
   from /oracle/product/102/lib/libclntsh.so.10.1
#3  0xc000000008c44940:0 in <unknown_procedure> + 0x1a0 ()
   from /oracle/product/102/lib/libclntsh.so.10.1
#4  0xc000000008c3f980:0 in ntevpwi+0xc0 ()
   from /oracle/product/102/lib/libclntsh.so.10.1
#5  0xc000000008c40150:0 in ntgbuini+0x210 ()
   from /oracle/product/102/lib/libclntsh.so.10.1
#6  0xc000000008c0f890:0 in nsgblini+0x590 ()
   from /oracle/product/102/lib/libclntsh.so.10.1
#7  0xc000000008c4b4c0:0 in niotns+0x6c0 ()
   from /oracle/product/102/lib/libclntsh.so.10.1
#8  0xc000000008d47de0:0 in nigcall+0xa0 ()
   from /oracle/product/102/lib/libclntsh.so.10.1
#9  0xc000000009456510:0 in <unknown_procedure> + 0x750 ()
   from /oracle/product/102/lib/libclntsh.so.10.1
#10 0xc000000008683a20:0 in kpuadef+0x80 ()
   from /oracle/product/102/lib/libclntsh.so.10.1
#11 0xc0000000088b22e0:0 in upiini+0x420 ()
---Type <return> to continue, or q <return> to quit---
   from /oracle/product/102/lib/libclntsh.so.10.1
#12 0xc000000008871280:0 in upiah0+0x80 ()
   from /oracle/product/102/lib/libclntsh.so.10.1
#13 0xc000000008682790:0 in kpuatch+0x800 ()
   from /oracle/product/102/lib/libclntsh.so.10.1
#14 0xc0000000088b8a20:0 in OCIServerAttach+0xe0 ()
   from /oracle/product/102/lib/libclntsh.so.10.1
#15 0xc000000008608350:0 in <unknown_procedure> + 0x2f0 ()
   from /oracle/product/102/lib/libclntsh.so.10.1
#16 0xc00000000860a500:0 in sqllam+0x200 ()
   from /oracle/product/102/lib/libclntsh.so.10.1
#17 0xc00000000861cad0:0 in sqllo3t+0x390 ()
   from /oracle/product/102/lib/libclntsh.so.10.1
#18 0xc000000008618730:0 in <unknown_procedure> + 0x350 ()
   from /oracle/product/102/lib/libclntsh.so.10.1
#19 0xc00000000861bf70:0 in sqlexp+0x18b0 ()
   from /oracle/product/102/lib/libclntsh.so.10.1
#20 0xc00000000860d420:0 in <unknown_procedure> + 0xb30 ()
   from /oracle/product/102/lib/libclntsh.so.10.1
#21 0xc00000000860e870:0 in sqlcxt+0x110 ()
   from /oracle/product/102/lib/libclntsh.so.10.1

 

Googled BUS_ADRALN - Invalid address alignment but no helpful clue was found. 

So I checked getsockopt's definition. There are three prototypes found from the manual:


 #include <sys/socket.h>

      int getsockopt(
          int         s,
          int         level,
          int         optname,
          void       *optval,
          int        *optlen
      );

UNIX 03 Only (X/Open Sockets)
      int getsockopt(
          int                    s,
          int                    level,
          int                    optname,
          void       *__restrict optval,
          socklen_t  *__restrict optlen
      );

Obsolescent UNIX 95 Only (X/Open Sockets)
      int getsockopt(
          int         s,
          int         level,
          int         optname,
          void       *optval,
          size_t     *optlen
      );

 

The program is using getsockopt in /usr/lib/hpux64/libxnet.so.1, which can be ascertained from the core file information.   The makefile is including the option -lxnet and libxnet.so is just a link file of libxnet.so.1.

I infer there is certain problem when the library file is be linked.

After that I found getsockopt in libc.a by using the command nm libc.a, therefore I decided to remove the option -lxnet from the makefile.  And I'm so lucky that the problem was sovled.