I’ve been having a hard time sifting through huge dwarf dump files in the last year or so, especially some of the huge dumps from the C++ standard template library. (Blech) So I’ve been working on a side project to let me do more powerful queries on dwarf information.  The part of the dwarf information that I usually have to sort through is the .debug_info section.  It’s essentially an abstract syntax tree of all (or part) of the information in the object file. In order to make it easier to sift through, I’ve started to write an XML dumper for this information, so that I get information something like:

   <t:namespace id='1178'>
      <name          string ='1'>std     </name>
      <SUN_link_name string ='1'>__1nDstd_</SUN_link_name>
      <sibling       ref4   ='1643'/> <!--__rwstd-->
      <t:structure_type id='1197'>
         <name               string ='1'>char_traits<char></name>
         <SUN_part_link_name string ='1'>nLchar_traits4Cc_</SUN_part_link_name>
         <decl_file          data1  ='3'/>
         <decl_line          data1  ='182'/>
         <SUN_template       ref4   ='1247'/> <!--char_traits-->
         <declaration        flag   ='1'/>
         <t:template_type_parameter id='1241'>
            <type ref4 ='883'/> <!--char-->
         </t:template_type_parameter>
      </t:structure_type>

Instead of the usual dwarfdump form, which is:

<1>< 1178>      DW_TAG_namespace
                DW_AT_name                  std
                DW_AT_SUN_link_name         __1nDstd_
                DW_AT_sibling               <1643>
<2>< 1197>      DW_TAG_structure_type
                DW_AT_name                  char_traits<char>
                DW_AT_SUN_part_link_name    nLchar_traits4Cc_
                DW_AT_decl_file             3 /set/c++/cafe8/mkapoor/lang5.9/libCstd.2.1.1/include/rw/traits
                DW_AT_decl_line             182
                DW_AT_SUN_template          <1247>
                DW_AT_declaration           yes(1)
<3>< 1241>      DW_TAG_template_type_parameter
                DW_AT_type                  <883>

The XML format is still preliminary, but it lets me play around with using the XQuery language for searching the XML and extracting pieces of it.  (I could also use XSLT, but XQuery is a little better for joins and more complex searches.) XQuery includes as a subset the XPath syntax.  I’m sure all this is just a bunch of gobbledy goop unless you already know some of this stuff, so here is an example:

In XPath, you can select all the XML nodes in a document based on what their parents are, for example:

//namespace/struct

This XPath expression would select all the “struct” XML nodes that are children of “namespace” nodes.

Using XQuery I wrote a simple script to dig out all the elements with a specific name, and show the names of the containers that are their ancestors.  The pathname to Mukesh’s source tree makes a featured appearance here because that’s where got my sample debug information from, it started while I was trying to track down a bug in the debug info for libCstd.

% ruby dwcmd.rb dwarf xgrep findname dw.xml __unLink
<?xml version="1.0" encoding="UTF-8"?>

   /set/c++/cafe8/mkapoor/lang5.9/libCstd.2.1.1/include/string.cc - 11
   std - 1120
   basic_string<char,std::char_traits<char>,std::allocator<char> > - 1827
   __unLink - 2455

   /set/c++/cafe8/mkapoor/lang5.9/libCstd.2.1.1/include/string.cc - 11
   std - 1120
   basic_string<char,std::char_traits<char>,std::allocator<char> > - 1771
   __unLink - 2201

   /set/c++/cafe8/mkapoor/lang5.9/libCstd.2.1.1/include/ostream.cc - 11
   std - 1121
   basic_string<char,std::char_traits<char>,std::allocator<char> > - 2735
   __unLink - 2926

   /set/c++/cafe8/mkapoor/lang5.9/libCstd.2.1.1/include/ostream.cc - 11
   std - 1121
   basic_string<char,std::char_traits<char>,std::allocator<char> > - 2806
   __unLink - 2997

As you can see, an item named “__unLink” shows up 4 times.  I extended the script to allow you to filter which items you wanted to see based on the names of their containers.  So when I search for “ostream:__unLink” the script will only show me items named __unLink that are within items that have “ostream” in the name.

% ruby dwcmd.rb dwarf xgrep findname dw.xml ostream:__unLink
<?xml version="1.0" encoding="UTF-8"?>

   /set/c++/cafe8/mkapoor/lang5.9/libCstd.2.1.1/include/ostream.cc - 11
   std - 1121
   basic_string<char,std::char_traits<char>,std::allocator<char> > - 2735
   __unLink - 2926

   /set/c++/cafe8/mkapoor/lang5.9/libCstd.2.1.1/include/ostream.cc - 11
   std - 1121
   basic_string<char,std::char_traits<char>,std::allocator<char> > - 2806
   __unLink - 2997

Pretty cool, huh?

Anyway, that’s as far as I got. There’s always more compiler bugs to fix, so I don’t get much time to work on infrastructure and internal tools. Maybe I’ll get some more hacking done over the holidays. XML feeds into some of my areas of technical curiosity, like RDF, RDFA, SPARQL, FOAF, etc.