Archive for May, 2007

Using dbx and libumem to find memory problems

Friday, May 18th, 2007

Update: There is a new version of umem.dbx here (Solaris 9 support) Download it here.

I implemented a spiffy little dbx module to give basic access to libumem debugging features a while back, but I haven’t gotten much feedback on it.  Think of this blog entry like a Dunk Tank at the carnival. Throw a blog comment at me (like a bug or an RFE) and make me add some new features to my libumem dbx module.

Update March 9, 2010: It looks like Eric dunked me good.  (See the comments).  This module may or may not work for you. But it’s still probably useful as an example of how to write advanced dbx scripts.

Here are things I know to be true:

  • developers prefer dbx to mdb (unless you’re hacking on the kernel)
  • the RTC feature in Sun Studio is slow and sometimes buggy
  • memory allocation and access bugs are NASTY to track down
  • nobody has requested any new features for the libumem module

So the only conclusion I can draw is that not enough people know about this module yet.  🙂  With that in mind, here is the command line version of the ever-popular screenshot.

(dbx) source umem.dbx
(dbx) alias u=umem

(dbx) u start
Enabling libumem debugging

(dbx) cc -g t.c
(dbx) debug a.out
(dbx) list 1,$
    1   #include <malloc.h>
    2   int main()
    3   {
    4        char * p;
    5        p = malloc(1);
    6        p = malloc(1);
    7        p = malloc(1);
    8        free(p);
    9        p++;
   10        // this free will cause an error in libumem
   11        // if checking is on, because it's a bad free
   12        free(p);
   13   }
   14

(dbx) run
signal ABRT (Abort) in __lwp_kill at 0xff2bd5ec
0xff2bd5ec: __lwp_kill+0x0008:  bcc,a,pt  %icc,__lwp_kill+0x18  ! 0xff2bd5fc
Current function is main
   12        free(p);

(dbx) print p
p = 0x5bfa9 "\xad\xbe\xef\xde\xad\xbe\xef\xfe\xed\xfa\xce\xfe\xed\xfa\xce"

(dbx) u findblock 0x5bfa9

Building umem_syms helper library.
Address 0x5bfa9 is inside the umem block at 0x5bfa0.
   This corresponds to the malloc block at 0x5bfa8.

# So we can see that the pointer we tried to free points into
# the middle of a block.

# Let's ask for a history of the umem block that caused umem to barf.

(dbx) u bhist 0x5bfe0

=================================================================
       Log Rec Addr          Block Addr   Thrd         Timestamp
       ------------          ----------   ----         ---------
            0x320c8             0x5bfa0    1     0xbda1d488b8260
0x107d0 : in `a.out`_start   /* No debugging info */
0x10c1c : in `a.out`t.c`main at "t.c":7
0xff36aeb4 : in `libumem.so.1`malloc   /* No debugging info */
0xff36e2d0 : in `libumem.so.1`_umem_alloc   /* No debugging info */
0xff36de8c : in `libumem.so.1`_umem_cache_alloc   /* No debugging info */
=================================================================
       Log Rec Addr          Block Addr   Thrd         Timestamp
       ------------          ----------   ----         ---------
            0x3212c             0x5bfa0    1     0xbda1d488ba010
0x107d0 : in `a.out`_start   /* No debugging info */
0x107d0 : in `a.out`_start   /* No debugging info */
0x10c2c : in `a.out`t.c`main at "t.c":8
0xff36b214 : in `libumem.so.1`malloc.c`process_free   /* No debugging info */
0xff36dfec : in `libumem.so.1`_umem_cache_free   /* No debugging info */
=================================================================

# Don't ask me why _start shows up twice in libumem stack capture.
# It's probably a stray tail-call optimization someplace.

# If you want to see the recent history of allocations/frees, do this:

(dbx) u log
       Log Rec Addr          Block Addr   Thrd         Timestamp
       ------------          ----------   ----         ---------
            0x32000             0x5bfe0    1     0xbda1d488b15c8
            0x32064             0x5bfc0    1     0xbda1d488b6fa0
            0x320c8             0x5bfa0    1     0xbda1d488b8260
            0x3212c             0x5bfa0    1     0xbda1d488ba010

Latest Sun dwarf extensions

Friday, May 11th, 2007

I’ve been working with the Sun lawyers and the Dwarf Standards Committee recently to change the overly zealous license on the Sun Dwarf Extensions document.  I think we’ve finally gotten it down to something reasonable.  Anyway, we’ve added a few twists for C++ and Fortran 90.  As an example, there are some new structures for identifying code segments that implement C++ destructors, and correlating them to the object being destroyed. So with that introduction, here is the latest document.

dbx .ldynsym support – stack traces for stripped programs

Thursday, May 3rd, 2007

Stack traces for stripped programs should get easier to read on Solaris.  Solaris Nevada added a new strip-proof symbol table that inherits part of the symbols that normally get stripped out by the strip command. Basically static functions. Static functions have always been the number one cause of unreadable stack traces in stripped programs, but not the Solaris utilities (dtrace, mdb, pstack, etc) and also dbx (in Sun Studio 12 FCS) will be able to make use of these new symbols. The new Solaris feature is described here.

Before:

% cc t.c && strip a.out && ./a.out ; dbx -c 'where;quit' - core
"t.c", line 2: warning: implicit function declaration: abort
Abort (core dumped)
Corefile specified executable: "/home/quenelle/a.out"
Reading a.out
core file header read successfully
Reading ld.so.1
Reading libc.so.1
program terminated by signal ABRT (Abort)
0xbff80717: __lwp_kill+0x0007:  jae      __lwp_kill+0x15        [ 0xbff80725, .+0xe ]
=>[1] __lwp_kill(0x1, 0x6), at 0xbff80717
[2] _thr_kill(0x1, 0x6), at 0xbff7ded4
[3] raise(0x6), at 0xbff2ced3
[4] abort(0x8047408, 0x80506db, 0x8047514, 0x8047428, 0x805062d, 0x1), at 0xbff10969
[5] 0x80506c8(0x8047514, 0x8047428, 0x805062d, 0x1, 0x8047434, 0x804743c), at 0x80506c8
[6] main(0x1, 0x8047434, 0x804743c, 0x80505cf), at 0x80506db

After:


% cc t.c && strip a.out && ./a.out ; dbx -c 'where;quit' - core
"t.c", line 2: warning: implicit function declaration: abort
Abort (core dumped)
Corefile specified executable: "/home/quenelle/./a.out"
Reading a.out
core file header read successfully
Reading ld.so.1
Reading libc.so.1
program terminated by signal ABRT (Abort)
0xff344a24: __lwp_kill+0x0008:  bcc,a,pt  %icc,__lwp_kill+0x18  ! 0xff344a34
=>[1] __lwp_kill(0x0, 0xffffffff, 0x0, 0x0, 0xfffffffc, 0x0), at 0xff344a24
[2] raise(0x6, 0x0, 0x5, 0x6, 0xffffffff, 0x6), at 0xff2f7504
[3] abort(0xff386a80, 0x1, 0x6, 0xff3836c0, 0xacf34, 0x0), at 0xff2d3824
[4] baz(0x0, 0x1000, 0xff385ac0, 0xff3a2000, 0x0, 0x4), at 0x10e6c
[5] main(0x1, 0xffbff3e4, 0xffbff3ec, 0x21000, 0xac71c, 0xff3a0140), at 0x10e9c