Debugger Design

Introduction

During my years in the Sun debugger group I had lots of impromptu design discussions with other members of the team. At some point someone would usually say: “Well in theory this module shouldn’t talk to that module….” or “We definitely need a better interface between XYZ and ABC here.” I thought it would be interesting to try and summarize what I remember about those discussions. It’s been a few years, so I’ll forget some of the noteworthy stuff, and of course many of the boring fundamentals are left out. Caveat Lector.

What follows is my attempt to summarize the primary modules in a modern debugger, and briefly describe their function and relationship with each other. There’s nothing here about UI. Debugger user interfaces are also very important and interesting, but it’s a topic that seems well suited for an independent discussion.

tools

A tool could be a shell script written by a user to perform a program-specific data dumping task, or it could be a full debugger GUI, or it could be a command-line interface that mimics the traditional gdb or dbx interfaces. Because the APIs support multiple sessions and asynchronous interaction, a debugger can support operations like stepping through one thread while other threads are running. The APIs to access the underlying debugger system should support multiple language bindings to facilitate both tool development and application-specific scripting by developers. Ideally a general purpose interface description languages (like IDL or SWIG) should be used in order to support automatic generation of bindings in multiple languages.

session management

The session management module allows the tool front ends to disconnect from a running session and it supports interoperability between different tools. To this end, it should be the provider for all the APIs accessible to tools. Many of the APIs will be directly passed through to one of the other modules listed here, like process inspection or symbol table access, but they will all need to participate in some sort of session to load or run executables or libraries. This module makes calls into almost all of the other modules.

One of the common themes in debugger design is the dual perspectives of your program as an executable on disk and your program as an executing process image. Without a clean separation between these two, it’s easy to have unnecessary dependencies creep into otherwise simple modules. The session management module sees your program from both of these perspectives, and it should be protected from all of the nitty gritty details as much as possible. This module is the traffic cop.

process management

Whether your program consists of a single process on the local host, or a set of related process on several different hosts, the processes should be coordinated by a single manager module. This module manages the difficult task of starting up a process and intercepting the execution before any user code is run. Many of the necessary details are operating-system specific, and so delegating the right operations to the process server is important. This module should support the debugging of interpreted languages by detecting what kind of program is being debugged, and using an appropriate method to start up the process, like starting bash, or a JVM or a Python interpreter. If a programming system requires a special start-up hook (like mpi_run for MPI programs), those hooks should be understood and utilized by the process management module.

process inspection

This module acts as an intermediary between the process server and other debugger modules that need access to target process state. The process inspection module will need to communicate with multiple, possibly heterogeneous process servers. This means it needs to have switchable knowledge of all the supported register sets and other hardware state, and be able to vector them through a common API to other modules. This module also acts as an intermediary for reading and writing the process address space, and accessing other kernel state like thread enumeration and signal status and masks. Enumerating the segments of the process virtual address space, and exposing the attributes of each segment is also an important function of this module.

expression evaluation

At a high level, this module glues together the symbol store module and the process inspection module. It has the language grammars built into it, or it interfaces with external parsing modules owned by the compiler. There is an implementation choice here whether to use a single grammar for a similar set of languages, or to create distinct grammar modules for different languages. For languages like C and C++ it’s probably easier to use a unified expression grammar.

Out of necessity there is some amount of code generation in the expression evaluation module. For even basic debugger functions, it must be possible to evaluate expressions like Obj1.foo() + Obj2.bar(). Function and method calls need to be turned into target-specific call site code so they can be executed by the target process. Because it needs to interact with the target process, the expression evaluation module will need to operate asynchronously by using services from the event module described below.

If the debugger supports the ability to enter complete code blocks or new function definitions, those might parsed here, or in the front end tools. Depending on the technology involved, it might be better to have the front-end tools generate code and debug data via direct calls to the compiler, and pass object files into the debugger, rather than passing the code snippets into the debugger for code generation.

When parsing an expression, it’s very challenging to resolve the symbol references correctly. Particularly global symbols in the context of shared libraries. Programs are built and run in sequence by the compiler, linker and runtime linker. When necessary the debugger should progressively resolve symbols in the exact same way, either by calling APIs in those tools, or by emulating them as precisely as possible.

The results of the compiler and linker phases on the original code can be seen by examining the resulting executable or shared library on disk. But if a new expression is supplied by the user, the symbols should be resolved by first emulating what the compiler and static linker would have done. This is not difficult for C, but it’s difficult for more complex for languages like C++.

Understanding how a global symbol will be resolved by the runtime linker may require interacting with the process itself, in a fairly advanced way. A program can load shared libraries with dlopen(), or they can be loaded automatically through possibly complex dependency chains specified in the shared libraries. Options can be embedded in the shared libraries or passed to dlopen that affect how global symbols are resolved, when they are resolved, and whether the symbols in the new shared library are made available for binding from other shared libraries. Because of these concerns, the resolution of global symbols at the “program scope” should be implemented in the expression evaluation module and not the symbol store module. The symbol store can be queried about global symbols, but if multiple matches are detected, it must return all possible matches, and the expression module should decide how the resolution should happen.

symbol store

This module sees your program as a set of files on the disk (one or more executables, and matching shared libraries). It primarily deals with linker symbols (ELF/COFF) and debugging information (dwarf/stabs). Ideally this module runs with the rest of the debugger, so in a distributed context, it will need to load alien data formats. For example, it might need to load ELF symbols while running on a host that supports only COFF symbols natively. This means trying to keep up-to-date copies of alien system headers, and modifying them so they can be used where the debugger is built. This can be easy or hard depending on how often these kinds of headers are updated on the host operating systems.

For performance reasons, symbolic information needs to be managed carefully. Some symbolic information is necessary for resolving expressions anywhere in the program. Global symbols are needed, and some local symbols (like C static variables) that users might want to find using a global query. There’s also a big chunk of debugging information that’s only relevant when a program counter or stack frame directly references the code (line number information, local variable location information). Caching this information intelligently, and loading it on demand has a dramatic effect on performance. Dwarf supports specific kinds of indexes for debugging information. Those indexes are a good starting point, but such needs change over time. The compiler can be enhanced to produce additional dwarf-style indexes, or the debugger can implement it’s own indexing algorithms over the complete debug information without help from index tables. One of the really useful indexes is a mapping from each header file to the list of object files that contain code from that header file. This helps to set breakpoints in macros and C++ templates.

It may be appropriate to have this module deal with emulating the compiler and linker symbol binding process. This lookup operation would need a starting scope containing the source line, function, class, object file, and executable or shared library. The symbol module would start looking outwards from this location to resolve a symbol according the specific rules of the language associated with that starting scope.

events

One of the really powerful modules in dbx is the event processing system. Supporting operating system features and compiler runtimes causes lots of back-and-forth operations between the debugger and the target process. (For example, user-land thread scheduling, OpenMP runtimes, runtime linkers.) This code to deal with this kind of interaction needs to be highly re-entrant because most of these operations can be overlapped during the execution of a program. The event system in dbx provides a clean way to encapsulate the code for implementing all these special back-and-forth operations, and in my opinion any modern debugger should have a very strong, centralized event processing system.

In order to cope with overlapping complex operations, you need a set of event handler templates that can be instantiated to watch for specific customized events. (Not a C++ template, just some kind of data structure representing a category of runtime event handlers.) For example, you need a “Stop at [Line]” template that can be instantiated as a “Stop at foo.c:12” handler. This handler instance will need to construct itself by instantiating a “Stop at [PC]” template as a “Stop at 0x5420” handler, and so on. Handlers often need several different kinds of subhandlers, or even arrays of subhandlers in order to implement complex operations like “Step out of [Frame]”. Depending on which subhandlers fire, the handler may reconfigure itself to change what kinds of events it listens for without actually firing it’s own event. The lowest level subhandlers result in modifying the process to add traps or other code-patches. When the process stops, the lowest level events are passed into the handlers that are listening to them, and a carefully designed algorithm makes sure that all handlers are evaluated in the correct order so that each handler’s inputs are evaluated before it is.

Some handlers represent end user breakpoints or watch points. If none of those user handlers fire after processing a set of incoming events, then the process is resumed until more process-level events are generated.

To allow the combination of event handlers from disparate runtime support modules (like OpenMP inspection and runtime linking support) and functionality modules (like source line stepping, and calling functions in the target process), it would be nice to use a dynamically typed language for the event handlers themselves, ideally a scripting language. The debugger user interface could use the scripting interface to implement simple conditional breakpoints. Advanced users could directly access the scripting language for more complex operations.

Some of the events that need to be processed by the event system are fairly low-level, like stepping a long sequence of machine instructions one at a time. In order for this architecture to perform well in a distributed environment, some of the event handlers will need to be off-loaded directly to the process server, to avoid a round trip across the network. Running the handlers using an interpreted language would also make it easier to transfer handler code between process servers on different host architectures. The algorithm that arranges the order of evaluation of the event graph also needs to function correctly across the multiple processes doing the handler evaluation.

expert modules

There are additional smaller modules that serve as experts for interpreting program runtime layers like OpenMP runtimes, the runtime linker, dynamic type information, etc. Each of these modules consists of an exported API to enumerate and control process-wide or program-wide resources, and a tree of event handlers that help support the APIs. When an expert module is enabled, the event handler tree would be supplied to the event module for integration with the other event handlers. This allows it to operate simultaneously with other subsystems in the debugger. For example, in order to enumerate all the OpenMP parallel regions in a process, the OpenMP expert might need to make a target call into the process. This target call might need to access the runtime linker expert to step through a dynamic linkage point. If the dynamic linkage resulting in loading a new shared library, new symbols might need to be cached. When these other operations finish, the original handler will fire, and the user will see the resulting list of OpenMP parallel regions.

Expert modules with special purpose handler sets can be written outside the context of the traditional debugger. Application specific stand-alone tools can be written as additional front-ends to the debugger infrastructure. For example, a dynamic chart of the number of allocated memory objects could be displayed while running an application. Application writers can supply expert modules that provide functions via an existing debugger UI, to facilitate debugging that specific application. For example, a mail reading application can supply a custom debugger script so that an engineer debugging it can simulate the reception of a message by the mail tool without needing to interact directly with a mail server.

process server

This module encapsulates the OS specific mechanisms for controlling and inspecting a running process or core file. These mechanisms include Linux ptrace and Solaris /proc. In a distributed debugger, this functionality needs to be fully encapsulated so that it can be exported over a wire protocol to the rest of the debugger. At a minimum, the module needs to set breakpoint traps, generate low-level function call sites for the execution of function calls in the target process, read and write memory, spawn and kill processes, trap basic fork/exec/spawn operations, etc. There only needs to be one process server per remote host in a debugging session. It should be able to handle multiple processes on that hots.

If the debugger supports direct machine code generation based on arbitrary source, then the ABI information encapsulated here will also be needed in the machine code generation module in the main debugger process. The structure of these dependencies depends on the technology used to generate the rich machine code.

The process server only needs to deal with one host architecture at a time. It lives on the host that runs the processes it will connect to, and it can be compiled with architecture specific configuration options. Some of the lower level platform specific ABI information can be isolated in this module. In 64-bit SPARC programs, the stack pointer is offset by a hard-coded fixed offset, in order to allow stack-relative machine instructions to access a more useful range of memory. So register O6 is the stack pointer register, but the number stored inside is not really the stack pointer value. This sort of information can be encapsulated in the process server.

In a distributed system, executable and shared library files may not be available for direct loading over the network by the symbol store module. In a widely distributed system, the host running the target process might be behind a firewall. The socket connection to the process server might be the only available channel to the remote system. In that case, the process server might need to act a simplified file access service so that the symbol store module can load symbols and debug information by proxy.

There are always a few dependency glitches in any clean design. Ideally, symbol information should be given to the process server from the symbol module, but it may be desirable to have some local ELF symbol reading to prepare for some of the more advanced operations like intercepting and controlling fork/exec/spawn operations.

Part of the event tree is processed in this module. Event handlers will need to be decomposed automatically or explicitly into code that executes in the core debugger modules, and code that executes remotely in the process server. One way to do this will be to use a function call infrastructure within the handler programming language that makes it transparent whether an API call from a handler will execute locally or remotely. This opens the way for some pathological performance issues when the author of a handler doesn’t know the performance impact of individual API calls, so a relatively predictable arrangement should be made so that some specific kinds of handlers will always run on the process server.