Archive for the ‘Compilers’ Category

Dwarf and XML

Tuesday, December 19th, 2006

I’ve been having a hard time sifting through huge dwarf dump files in the last year or so, especially some of the huge dumps from the C++ standard template library. (Blech) So I’ve been working on a side project to let me do more powerful queries on dwarf information.  The part of the dwarf information that I usually have to sort through is the .debug_info section.  It’s essentially an abstract syntax tree of all (or part) of the information in the object file. In order to make it easier to sift through, I’ve started to write an XML dumper for this information, so that I get information something like:

     <t:namespace id='1178'>
      <name          string ='1'>std     </name>
      <SUN_link_name string ='1'>__1nDstd_</SUN_link_name>
      <sibling       ref4   ='1643'/> <!--__rwstd-->
      <t:structure_type id='1197'>
         <name               string ='1'>char_traits&lt;char&gt;</name>
         <SUN_part_link_name string ='1'>nLchar_traits4Cc_</SUN_part_link_name>
         <decl_file          data1  ='3'/>
         <decl_line          data1  ='182'/>
         <SUN_template       ref4   ='1247'/> <!--char_traits-->
         <declaration        flag   ='1'/>
         <t:template_type_parameter id='1241'>
            <type ref4 ='883'/> <!--char-->
         </t:template_type_parameter>
      </t:structure_type>

Instead of the usual dwarfdump form, which is:

<1>< 1178>      DW_TAG_namespace
                DW_AT_name                  std
                DW_AT_SUN_link_name         __1nDstd_
                DW_AT_sibling               <1643>
<2>< 1197>      DW_TAG_structure_type
                DW_AT_name                  char_traits<char>
                DW_AT_SUN_part_link_name    nLchar_traits4Cc_
                DW_AT_decl_file             3 /set/c++/cafe8/mkapoor/lang5.9/libCstd.2.1.1/include/rw/traits
                DW_AT_decl_line             182
                DW_AT_SUN_template          <1247>
                DW_AT_declaration           yes(1)
<3>< 1241>      DW_TAG_template_type_parameter
                DW_AT_type                  <883>

The XML format is still preliminary, but it lets me play around with using the XQuery language for searching the XML and extracting pieces of it.  (I could also use XSLT, but XQuery is a little better for joins and more complex searches.) XQuery includes as a subset the XPath syntax.  I’m sure all this is just a bunch of gobbledy goop unless you already know some of this stuff, so here is an example:

In XPath, you can select all the XML nodes in a document based on what their parents are, for example:

//namespace/struct

This XPath expression would select all the “struct” XML nodes that are children of “namespace” nodes.

Using XQuery I wrote a simple script to dig out all the elements with a specific name, and show the names of the containers that are their ancestors.  The pathname to Mukesh’s source tree makes a featured appearance here because that’s where got my sample debug information from, it started while I was trying to track down a bug in the debug info for libCstd.

% ruby dwcmd.rb dwarf xgrep findname dw.xml __unLink
<?xml version="1.0" encoding="UTF-8"?>

   /set/c++/cafe8/mkapoor/lang5.9/libCstd.2.1.1/include/string.cc - 11
   std - 1120
   basic_string<char,std::char_traits<char>,std::allocator<char> > - 1827
   __unLink - 2455

   /set/c++/cafe8/mkapoor/lang5.9/libCstd.2.1.1/include/string.cc - 11
   std - 1120
   basic_string<char,std::char_traits<char>,std::allocator<char> > - 1771
   __unLink - 2201

   /set/c++/cafe8/mkapoor/lang5.9/libCstd.2.1.1/include/ostream.cc - 11
   std - 1121
   basic_string<char,std::char_traits<char>,std::allocator<char> > - 2735
   __unLink - 2926

   /set/c++/cafe8/mkapoor/lang5.9/libCstd.2.1.1/include/ostream.cc - 11
   std - 1121
   basic_string<char,std::char_traits<char>,std::allocator<char> > - 2806
   __unLink - 2997

As you can see, an item named “__unLink” shows up 4 times.  I extended the script to allow you to filter which items you wanted to see based on the names of their containers.  So when I search for “ostream:__unLink” the script will only show me items named __unLink that are within items that have “ostream” in the name.

% ruby dwcmd.rb dwarf xgrep findname dw.xml ostream:__unLink
<?xml version="1.0" encoding="UTF-8"?>

   /set/c++/cafe8/mkapoor/lang5.9/libCstd.2.1.1/include/ostream.cc - 11
   std - 1121
   basic_string<char,std::char_traits<char>,std::allocator<char> > - 2735
   __unLink - 2926

   /set/c++/cafe8/mkapoor/lang5.9/libCstd.2.1.1/include/ostream.cc - 11
   std - 1121
   basic_string<char,std::char_traits<char>,std::allocator<char> > - 2806
   __unLink - 2997

Pretty cool, huh?

Anyway, that’s as far as I got. There’s always more compiler bugs to fix, so I don’t get much time to work on infrastructure and internal tools. Maybe I’ll get some more hacking done over the holidays. XML feeds into some of my areas of technical curiosity, like RDF, RDFA, SPARQL, FOAF, etc.

Linux Compilers require a glibc fix (headers)

Friday, November 11th, 2005

There is a glibc bug that makes our new C Compiler on Linux not work. The symptom looks like this:

> "helloworld.c", line 8: internal compiler error: DBGGEN ERROR:
> FILE="../src/dbg_libdwarf.c", LINE=46, Could not load dwarf library:
> libdwarf.so : libdwarf.so: cannot open shared object file: No such file
> or directory [DBG_GEN 5.0.8]
> cc: acomp failed for helloworld.c

The way to fix it on SuSE Linux Enterprise 9 (our primary tested version of SuSE) is to use this patch. But now we have someone asking how to fix that in SuSE 9.1? Now I’m at the limits of my Linux experience. Here are my questions:

  • How do I find out what version of glibc is installed?
  • How do I find out which version of glibc got a specific fix?

I know I can use rpm -qf to find out which SuSE package is installed and contains /lib/libc.so.XX But that doesn’t give me the glibc version (like 2.3.3 or whatever). The bug I’m looking at is described as “Bug 47950” in the SuSE SLES9 docs. Is that a novell-specific bug tracking system? Technorati tags :

debug info in XML, and DSD 2.0

Sunday, September 18th, 2005

I’ve been working in my spare time on the idea of converting dwarf debugging information into XML so that I can format it as XHTML using a stylesheet, and so I can check it using a Schema of some sort. When I started fiddling today I assumed that using a DTD was the way to go, that what one does with XML, no? Well after banging my head against the DTD format for a while, and looking for help on (our friend) the internet, I stumbled across a more general description of different ways to write XML schemas alternatives. Notice the lower case schema there. One of the several ways is called simply “XML Schema”, which is an alternative to DTD and DSD etc. Don’t get confused yet, you just started reading.

I had to kick myself in the head again tonight. Every time I get really stumped on trying to find good information on the internet, I end up realizing that everything I wanted was already there in Wikipedia. In the really confusing situations where I’m jumping in the deep end of the pool, all I really need to start me out is a two page summary of the state of the art. something to put all the technology jargon into context for me. But I still haven’t learned to look on Wikipedia first. *kick* *ouch*

God, lists look ugly in the hacked theme I’m using. I should fix them up one day. Anyway, here is good info on the different ways to formally describe your XML so that you can check it, and make sure it doesn’t have bugs.

From oldest and klunkiest to newest and hottest, the different languages are: 1) DTD 2) “XML Schema” 3) DSD 2.0

Note: Clayton Wheeler also pointed me at Relax NG, which is much more elegant.