Archive for September, 2006

Enterprise versus The Developer

Monday, September 18th, 2006

Note: I’m just an engineer at Sun.  What follows is my own personal perspective, and not to be taken as Sun’s official opinion in any way.  To the best of my knowledge I’m not giving away any trade secrets, but I am speaking frankly about Sun’s business model.

I attended an all-hands meeting with Rich Green (Sun’s head of software) today, and there was some discussion at the end about Sun’s approach to the desktop market.  During the discussion I got bitten by a sudden perspective. That happens to me a lot, but I don’t often take the time to write up my perspective or try to communicate it to people.  This time I figured I’d share my ideas.

I guess you could summarize the whole rest of this essay by saying that I believe in the long run  the hearts and minds of the software developer community at large will be won or lost on the basis of the desktop. To understand what I mean by this, read on.

I use Windows at home.

As a Sun employee, I’ve had the usual dilemma for many years now. Should I run Solaris at home? I have a computer, I know how to do my own administration. I could run anything I wanted to. Solaris, Linux, Apple, Windows, whatever.  Today I’m running Windows at home because I don’t have the time or energy to maintain more than one computer, or to maintain more than one operating system.  And windows runs the software I want to run. Games, Productivity software, random internet crap. (No snide comments about viruses please)  I have the same problems as 100 million other people, and when I have a problem, I just google “<my problem>” and up pops the answer.

So how does this relate to Sun and Sun’s business?  Well, I’m getting to that.  During this discussion today about the future of Sun in the desktop market, I was listening at home on my Windows box.  But I’d spent the entire day earlier developing Solaris software.  With my Windows box.  I count myself as an engineer.  Most days, I spend most of time concerned with code.  But the vast majority of tasks I do can be done natively on Windows.  I read and write email. I update internal and external wiki sites. I browse Sun’s internal web. I update bugs using a Java bugster client. I read PDF specs. I use term windows to log into Solaris machines to build things and reproduce bugs, and do other tasks.

Of course, for more intense code hacking, I need XEmacs with local NFS access to my sources, so there are some things I can’t do from home. But I could do them just fine from a Windows machine at work, if I bothered to take my laptop to work.  I was operating today in what I call “hybrid developer” mode. Using one desktop OS to develop software for another OS.

Sun does Enterprise.

Sun has enterprise class hardware, lots of big iron.  Sun has an enterprise class OS, Solaris.  Sun has an enterprise class software stack with open standards based servers.  Sun’s business seems to be totally oriented towards feeding large IT departments, telcos, banks, etc what they need.  Big iron.  But what about Sun’s smaller rack-mount systems, you ask? And what about Sun’s desktop machines? In my opinion, Sun’s smaller hardware boxes are essentially spin-offs to capitalize on technology that we developed for enterprise-class machines and software.  Inside Sun we’re focussed on the customers who buy our stuff.  (As a stock holder, I’m very pleased to see this!) But it represents a bias in our thinking, and a bias in Sun’s internal engineering culture.

I love Solaris, but I’m not an admin.

So here my deal. I’m a Solaris developer. Wait, let me be more clear. I’m a developer of software that RUNS on Solaris.  I am NOT a developer OF Solaris. I’m not in the kernel group. I’m not in the desktop group.  My job doesn’t require me to run BFU to install the latest nightly build of Solaris.  I can do Solaris system administration if I need to, but I don’t do it for fun, and it’s not part of my job.  I love using Solaris, but I don’t love administering it. Solaris is pretty damn painful to administer compared to a desktop OS.  But that’s not a fair comparison because Solaris is totally geared towards enterprise users, and not towards desktop users.

Software Updaters.

One area of functionality I wanted to talk about is web-based software installers and updaters. There are two groups of people I want to talk about here. Each group sees a problem, and is trying to solve it. And each group thinks the other group’s problem is the same as theirs. (Actually, these groups are imaginary groups, because I’m really talking about perspectives and not individual people.  The perspective differences in Sun cause language problems, communication breakdown and lack of synergy between groups.)

One group has the enterprise perspective.  It’s focused on things like smpatch and updatemanager for delivering Solaris patches. One group has the desktop perspective.  It’s focused on things like and pkg-get for delivering things like Ruby compiled for Solaris x86. The desktop perspective says: Linux uses apt-get (or red carpet or whatever) to update application software and OS packages both, why can’t Solaris just convert to using some better than patchadd and pkgadd? The enterprise perspective says: Well, once we get updatemanager up and running a little better, we can eventually start to include unbundled applications in our centrally controlled server-based software distribution model.  Both perspectives make perfect sense, but only if you’re looking in one specific direction.

How to get Developers.

Sun has several approaches that it could use to getting Solaris developers. I’ll list them in order of the impact they would have on developers: 1) Get Solaris on lots of desktops so that it will be a natural and easy starting point for new software development. 2) Embrace more fully the “hybrid” nature of much of today’s Solaris development. 3) Do whatever we can to encourage multi-platform projects to support and distribute software for as many flavors of Solaris as possible.

Option 1 puts the user dead center inside a rich environment of wonderful Sun technology like dtrace, ZFS, zones, Xen support. etc, etc. This will be contagious.  If you’re familiar with the Sun desktop, and you’re familiar with the Sun administration commands, your software will end up working better on Solaris, and you’ll be more likely to stick with Solaris. That’s good for Sun. Option 2 is one step removed. Hybrid development is what I do at home.  Use a Windows, or Linux, or OS X desktop, but develop software for Solaris. If you’re doing primary development on Solaris, your software will be more likely to run best on Solaris.  I might still be using Solaris as my primary development platform for the software that I build, but my daily interaction will be with a different OS. The one that runs on my desktop. Option 3 is another step removed.  Sun will benefit if more open source software is compiled and distributed for Solaris.  If Sun can make it easier to port software to Solaris, that’s a step forward, too.  And there’s plenty we could do to help that. All of these options are opportunities for Sun to focus on, if we really want to get more Solaris developers. Development tools support and basic OS support could be tuned towards supporting those kinds of users.

But it’s a slippery slope.

Having your OS be on the developer’s desktop is the core of getting a really healthy developer community. As the user gets more and more removed from Solaris, they start to see it as just another platform that they might or might not port to.

Graphical display of thread synchronization

Monday, September 11th, 2006

One of the things that’s really hard about debugging threaded programs is tracking down which threads own which locks, and figuring out which locks they are supposed to own. In other words, synchronization bugs.  The most difficult symptom to debug is data corruption, because it’s very hard to track down exactly where things start to go wrong. In those cases where your program actually ends up in a deadlock, you get start with a smoking gun, and work from there. Much easier.

One way to find synchronization bugs in your program is to use Sun’s new Data Race Detection Tool you can find a preview version of that tool in Sun Studio Express 2.

Another way to hunt for bugs is to use dbx’s built-in synchronization debugging commands.  You can list all the locks in your program, and find out which threads own them, and which threads are waiting on them.

Here is some output from my dining philosophers program:

(dbx) syncs
All locks currently known to libthread:
forks (0x00021670): thread  mutex(locked)
forks+0x18 (0x00021688): thread  mutex(locked)
forks+0x30 (0x000216a0): thread  mutex(locked)
forks+0x48 (0x000216b8): thread  mutex(locked)
forks+0x60 (0x000216d0): thread  mutex(locked)
foodlock (0x00021708): thread  mutex(unlocked)

(dbx) sync -info 0x00021670
forks (0x21670): thread  mutex(locked)
Lock owned by t@2
Threads blocked by this lock are:
        t@6 a l@6 philosopher() sleep on 0x21670 in __lwp_park()

Okay, that’s fine. But obviously I need to pull out a pad of paper and start drawing boxes if I want to see where my bugs are. Of course, there are tools for drawing boxes and arrows, and all you have to do to use them is to convert your data into XML.

So I wrote a little ksh script and a little perl script, and presto, instant pictures. Well, I had to download a graph editing/layout tool … and I had to learn how to use it. But that wasn’t so bad.

When I first ran my dining philosophers program, now don’t laugh, I didn’t actually unlock my eating utensils. I wrote the unlocks, but then I rearranged some stuff, and they got dropped on the floor. So the first time I ran it, I got a deadlock.

My original plan was:

  • write functioning dining philosophers program
  • inject artificial bug
  • write graph utility
  • write blog

I ended up executing a slightly different plan:

  • write buggy dining philosophers program
  • get deadlock
  • write graph utility
  • fix dining philosophers program
  • write blog

Anyway, here is the picture that resulted from my deadlocked program. The graph edge with the same source/destination node is a dead give away. 🙁 The t@2 names are the dbx names for the threads.  “forks” is the name if the array variable that holds the locks representing the eating utensils for the dining philosophers. So “forks” is one lock, and “forks+0xNN” is another lock. (see source code link below).

sync graph with bug

sync graph with bug

Here is a picture when the program is working right. In other words, after I added my missing unlock statements.

sync graph for working program

sync graph for working program

The ksh function is copied into the comments at the top of the perl script. So to run this demo yourself, here are the installation instructions.

  • download dine.c
  • download syncs_to_graphml
  • install syncs_to_graphml somewhere on your search path (or edit the ksh script to find it)
  • copy the ksh script out of the comments in the perl script and into your ~/.dbxrc file
  • download the yEd program, and get it up and running (written in java, so it’s easy to set up)

To run the demo:

  • compile dine.c with -g and -lpthread
  • load it into dbx
  • run it, and stop the program in the middle (ctrl-C)
  • use the new syncgraph command inside dbx
  • load the output file /tmp/syncgraph.graphml into the yEd editor
  • use Tools -> Fit Node To Label (hit OK)
  • use Layout -> Classic (hit OK)

At that point, you should get a picture of the threads and their locks.

At that point you can play around with the various layout options for arranging the nodes in the graph.  Don’t be annoyed at all the little properties and numbers you can set.  I just ignore those most of the time. You can also export the image as jpg, pdf, etc.