Using dbx and libumem to find memory problems

Update: There is a new version of umem.dbx here (Solaris 9 support) Download it here.

I implemented a spiffy little dbx module to give basic access to libumem debugging features a while back, but I haven’t gotten much feedback on it.  Think of this blog entry like a Dunk Tank at the carnival. Throw a blog comment at me (like a bug or an RFE) and make me add some new features to my libumem dbx module.

Update March 9, 2010: It looks like Eric dunked me good.  (See the comments).  This module may or may not work for you. But it’s still probably useful as an example of how to write advanced dbx scripts.

Here are things I know to be true:

  • developers prefer dbx to mdb (unless you’re hacking on the kernel)
  • the RTC feature in Sun Studio is slow and sometimes buggy
  • memory allocation and access bugs are NASTY to track down
  • nobody has requested any new features for the libumem module

So the only conclusion I can draw is that not enough people know about this module yet.  🙂  With that in mind, here is the command line version of the ever-popular screenshot.

(dbx) source umem.dbx
(dbx) alias u=umem

(dbx) u start
Enabling libumem debugging

(dbx) cc -g t.c
(dbx) debug a.out
(dbx) list 1,$
    1   #include <malloc.h>
    2   int main()
    3   {
    4        char * p;
    5        p = malloc(1);
    6        p = malloc(1);
    7        p = malloc(1);
    8        free(p);
    9        p++;
   10        // this free will cause an error in libumem
   11        // if checking is on, because it's a bad free
   12        free(p);
   13   }
   14

(dbx) run
signal ABRT (Abort) in __lwp_kill at 0xff2bd5ec
0xff2bd5ec: __lwp_kill+0x0008:  bcc,a,pt  %icc,__lwp_kill+0x18  ! 0xff2bd5fc
Current function is main
   12        free(p);

(dbx) print p
p = 0x5bfa9 "\xad\xbe\xef\xde\xad\xbe\xef\xfe\xed\xfa\xce\xfe\xed\xfa\xce"

(dbx) u findblock 0x5bfa9

Building umem_syms helper library.
Address 0x5bfa9 is inside the umem block at 0x5bfa0.
   This corresponds to the malloc block at 0x5bfa8.

# So we can see that the pointer we tried to free points into
# the middle of a block.

# Let's ask for a history of the umem block that caused umem to barf.

(dbx) u bhist 0x5bfe0

=================================================================
       Log Rec Addr          Block Addr   Thrd         Timestamp
       ------------          ----------   ----         ---------
            0x320c8             0x5bfa0    1     0xbda1d488b8260
0x107d0 : in `a.out`_start   /* No debugging info */
0x10c1c : in `a.out`t.c`main at "t.c":7
0xff36aeb4 : in `libumem.so.1`malloc   /* No debugging info */
0xff36e2d0 : in `libumem.so.1`_umem_alloc   /* No debugging info */
0xff36de8c : in `libumem.so.1`_umem_cache_alloc   /* No debugging info */
=================================================================
       Log Rec Addr          Block Addr   Thrd         Timestamp
       ------------          ----------   ----         ---------
            0x3212c             0x5bfa0    1     0xbda1d488ba010
0x107d0 : in `a.out`_start   /* No debugging info */
0x107d0 : in `a.out`_start   /* No debugging info */
0x10c2c : in `a.out`t.c`main at "t.c":8
0xff36b214 : in `libumem.so.1`malloc.c`process_free   /* No debugging info */
0xff36dfec : in `libumem.so.1`_umem_cache_free   /* No debugging info */
=================================================================

# Don't ask me why _start shows up twice in libumem stack capture.
# It's probably a stray tail-call optimization someplace.

# If you want to see the recent history of allocations/frees, do this:

(dbx) u log
       Log Rec Addr          Block Addr   Thrd         Timestamp
       ------------          ----------   ----         ---------
            0x32000             0x5bfe0    1     0xbda1d488b15c8
            0x32064             0x5bfc0    1     0xbda1d488b6fa0
            0x320c8             0x5bfa0    1     0xbda1d488b8260
            0x3212c             0x5bfa0    1     0xbda1d488ba010

3 Responses to “Using dbx and libumem to find memory problems”

  1. Eric Safern says:

    Hello,

    I’ve been playing with your libumem module for dbx – but no joy, so far.

    Here’s what I’ve got – a simple program that corrupts memory.

    When I use mdb, I get this:

    05:14 PM sundev19:c++ $ mdb a.out
    > ::run
    leaking 20 bytes…
    leaking 20 bytes…
    leaking 20 bytes…
    leaking 20 bytes…
    leaking 20 bytes…
    mdb: stop on SIGABRT
    mdb: target stopped at:
    libc.so.1`_lwp_kill+8: bcc,a,pt %icc, +0x10
    mdb: You’ve got symbols!
    Loading modules: [ ld.so.1 libumem.so.1 libc.so.1 ]
    > ::umem_log
    CPU ADDR BUFADDR TIMESTAMP THREAD
    1 0006a190 00155ef8 20ccc937bc8c0 00000001
    1 0006a12c 00155f28 20ccc574506a0 00000001
    1 0006a0c8 00155f58 20ccc1b139930 00000001
    1 0006a064 00155f88 20ccbdf763e00 00000001
    1 0006a000 00155fb8 20ccba3c47b10 00000001
    0002a1f4 00000000 0 00000000

    But when I run the same code with dbx and your module, the transaction log is supposedly empty!

    Poking around inside dbx, I pulled up this data structure, which looks fishy to me – but I’m no expert.

    What do you think?

    (struct umem_log_header *) umem_transaction_log = 0x28008
    (dbx 24) p * ((umem_log_header *) umem_transaction_log)
    *((struct umem_log_header *) umem_transaction_log) = {
    lh_lock = {
    flags = {
    flag1 = 4U
    flag2 = ”
    ceiling = ”
    mbcp_type_un = {
    bcptype = 0
    mtype_rcount = {
    count_type1 = ”
    count_type2 = ”
    }
    }
    magic = 19800U
    }
    lock = {
    lock64 = {
    pad = “”
    }
    lock32 = {
    ownerpid = 0
    lockword = 0
    }
    owner64 = 0
    }
    data = 0
    }
    lh_base = 0x2a000 “”
    lh_free = 0x2000
    lh_chunksize = 141733920769U
    lh_nchunks = 1
    lh_head = 0
    lh_tail = 262144
    lh_hits = 19800
    lh_cpu = (
    {
    clh_lock = {
    flags = {
    flag1 = 0
    flag2 = ”
    ceiling = ”
    mbcp_type_un = {
    bcptype = 0
    mtype_rcount = {
    count_type1 = ”
    count_type2 = ”
    }
    }
    magic = 0
    }
    lock = {
    lock64 = {
    pad = “”
    }
    lock32 = {
    ownerpid = 0
    lockword = 0
    }
    owner64 = 0
    }
    data = 0
    }
    clh_current = (nil)
    clh_avail = 0
    clh_chunk = 0
    clh_hits = 0
    clh_pad = “”
    }

    )
    }

    Thanks in advance for your help!

  2. Chris Quenelle says:

    I’ll take a look, thanks for letting me know.

  3. Chris Quenelle says:

    Well this problem has been on my to-do list for months now, and I haven’t found the time to investigate. I’m really sorry about that Eric. I may still revive this project some day, but it probably won’t be soon.