CMT and MT

(Giri Mandalika has a new blog entry on how to use dbx’s thread related commands. Check it out!)

This paper at ACM Queue is an excellent analysis of the state of the art in threading issues and problems, and what the development tools can do about it.

Software and the Concurrency Revolution

I’ve been thinking along these lines myself recently, so it was nice to see someone else had done the work of putting this all into writing. There are a few things I would add. Hmmmm. It turned out to be more than a few things by the time I finished writing what follows. The contents of the paper is much more worth reading than my humble thoughts. Go read that first, and then come back and read my comments. The paper itself is fairly easy to read, and not that long.

Back already? Anyway, here are my thoughts on the ideas presented in the paper.

Who needs MT tools?

There are lots of apps that are well-designed from the ground up to be multi-threaded. These apps all have a consistent strategy for dealing with shared synchronization. I think the developers on these projects are not actually clamoring for new threads-related tools and support. (Maybe they are. If so, clamor louder. Let me hear from you. ;-] ) The Solaris kernel used to use a home-grown tool called Warlock to do static checking of mutexes (this later turned into Lock Lint), but they don’t seem to use it much anymore.

The apps that I have heard from that need the most help with finding deadlocks and tuning synchronization were very large serial apps that were turned into multi-threaded apps without redesigning them.

This doesn’t mean we don’t need tools to help with this, it just changes my perspective a little. If you are in the midst of redesigning a large app, you need a whole bunch of other tools too. You need static analyzers to sort out your dependencies, you need source browsers, you need interface checkers, etc, etc. None of these other features is specific to MT programs. But I think MT programs, have a general affect of ratcheting up the level of tools support you need across the board.

The best way to synchronize

The best way to manage the dependencies (any dependencies) between two modules is not to have any dependencies. The best way to share direct access to a data structure between two modules is not to do it. The best way to safely share synchronization between two modules, is not to share it. If at all possible, modules should use synchronization internally only, on the data that is controlled by that module itself. This doesn’t mean that a module can be converted from thread-unsafe to thread-safe without disturbing the user of that module. Very often the API will need to be modified to export methods and functions that are structured in such a way that they can be implemented efficiently in terms of synchronization.

For example, if you have a module that contains a group of items that are managed internally to the module, the last thing you want to do is offer an iterator. If your module needs to support selecting a subset of these items, you want to offer a “search()” function that takes a predicate function as an argument, and returns a list of item-references. That way the module itself can be responsible for iterating in a safe and efficient way over its own collection.

As part of exporting this search() interface, you would document that the module prohibits reentry back into itself from the predicate. (At least for the same module instantiation) Ideally, you would enforce that restriction within the module itself.

The paper listed above touches on these issues when it talks about the benefits that functional programming languages offer (or don’t offer).

locking around calls to other modules

The paper mentions several times that using locks around calls to other modules can provoke deadlock. This is a little bit oversimplified. It is really only true when you call a module that can call back into the current module, or if you share synchronization objects with the module that you’re calling. In these circumstances, other problems occur, like data structures not being made robust to reentrancy. Calling a well-defined module while holding a lock doesn’t have to be problematic.

This “reentrancy” issue is also a big deal for distributed applications as well. A distributed application that falls into a strict “client/server” model is fairly straight forward, but many distributed apps have multiple independant participants, or have callbacks from the server to the client. In those cases reentrancy can bite you pretty easily.

distributed programs

Many tools for dealing with MT programs are just as applicable to distributed programs. To repeat this different words, many tools for dealing with shared-memory parallelism are also applicable to distributed-memory parallelism. MT programs are more likely to need such tools because their nature is to be more tightly coupled between threads.

sedimentation as simplification

Sedimentation is what happens in a software system over long periods of time when functionality present in the app is gradually implemented in the platform or the software framework utilized by the application. (I know I’ve seen the term used before for software, but I don’t see that usage on google.)

An application today may consist of multiple interacting modules. As some of these modules sediment into the platform (whether that is Solaris, Linux or the Java VM). This causes the remaining software (that is considered “the app”) to be simpler and more maintainable. Of course, at the same time new functionality is being added to the app, so the net change could be simpler or more complex.

It may be that a parallel program today might become a serial program in a few years, when it is redesigned to use a new module available from the platform. An OpenMP program today might be rewritten to call the performance library tomorrow. From the application’s point of view, it may have gone from being a parallel program to being a serial program. The Solaris OS is multi-threaded, but that doesn’t make every program that runs on Solaris multi-threaded. What’s important is not whether multiple threads are running inside your program somewhere. What’s important is whether thread creation and thread synchronization need to be managed by your program.

The action that I have described here as “sedimentation” implies a huge amount of: testing, documentation, consensus building (to create an open, accepted standard for a platform extension) etc. One side-effect of all that work is that the new module (as part of the platform) is easier to use than it was before. This is a natural part of the software evolutionary cycle.

All of the problems described in the paper are made worse by having a large complex app with poorly defined interfaces and poor modularity. One way of looking at the future of threads is that the need to add threads will force an application to be “cleaned up” it will increase pressure to make the part of the app more modular. This will increase testability and decrease shared synchronization etc.

Benefiting from CPU performance growth

quote:

The concurrency revolution is primarily a software revolution. The difficult problem is not building multicore hardware, but programming it in a way that lets mainstream applications benefit from the continued exponential growth in CPU performance.

Exactly which apps need to take advantage of the “continued exponential growth in CPU performance”?

There are large applications and systems today that are not CPU bound, but are instead performance constrained by the network, or disk I/O, or constrained by a poor design, or constrained by the performance of the platform they run on top of.

For applications that are CPU bound today, some will need to become more heavily threaded to take advantage of CMT technology. Many of these apps are already threaded, and will need incremental work to create smaller chunks of work to increase the effective number of threads that can contribute.

For applications that are not CPU bound, there are also very good reasons to thread your application to increase performance, but those reasons are not related to CMT technology. The advent of CMT technology doesn’t really represent a big change in the way apps need to be developed. It just amplifies a need for better MT tools that we’ve had for a long time.