Archive for the ‘Compilers’ Category

On Iteration by Andrei Alexandrescu

Wednesday, June 23rd, 2010

I just finished reading a great article on iterators by Andrei Alexandrescu.  Mr. Alexandrescu is a contributor to the D programming language.  In this paper, he discusses the background of iterator implementations including C++ STL iterators, and then goes on to outline a new model for iterators.  It’s very readable, I recommend it.

http://www.informit.com/articles/article.aspx?p=1407357

To get a more readable all-in-one page, click on the “print” link on the page above, or go here:

http://www.informit.com/articles/printerfriendly.aspx?p=1407357

Types, Objects and Generic Types

Wednesday, February 24th, 2010

Assume the axiom that punctuation is evil.  Using multiple kinds of brackets for similar functionality in a language is unnecessarily complex.

The result is:

The relationship between a generic type and it’s concrete types should be expressed in the same way as for the relationship between a type and the objects that are instances of it.

In C++, constructors use: foo(object1, object2), but templates use Foo<Baz,Bar>

Can we unify the syntax for these similar concepts without making the language complex?

It’s completely clear to me why using different punctuation makes it easier to explain how the compiler is implemented, and easier to implement the compiler.  But I don’t think the distinction is necessary to make the source code easier to read and write.

Let’s say we only have “objects”. “Type” would be a role that on object plays with respect to another object. The compiler simply instantiates some objects at compile time, as part of the compilation process.  That seems like a clear concept for coders to understand.

The goal of a programming language is to facilitate the writing and maintaining of software.  It’s not to make the compiler’s job easier.

Facets of Programming

Tuesday, November 10th, 2009

I’ve been thinking recently about the fact that the average piece of software code includes instructions to the compiler mixed together with instructions that should be executed at runtime.  Type declarations are instructions to the compiler. Most of the general sequential code is instructions that should be executed at runtime.  It occurred to me these are just two of many facets of software.  It would be nice to enable all these facets to be mixed together into one document so that the author of the software can keep all the facets consistent.

Another facet is specifications or unit tests.  I group those together because the way they’re tied to software at the code level is very similar, and they serve similar purposes.  There is an approach to coding called “Test Driven Design” where unit tests are written simultaneously with individual chunks of code.  There is a variant of this called “Behavior Driven Design”.  I was exposed to BDD in the latest Scala book (Programming Scala) and that was when I realized the TDD is really about verifiable specification, not so much about testing.

I really don’t want to use something that’s just a “programming language”, I want to use a “Software Authoring System”.

So what are the facets that a good “Software Authoring System” needs?

Runtime instructions: The purest expression of this facet is in dynamically typed languages, because they omit static type declarations.

Compiler instructions: Static type declarations for variables are instructions to the compiler. Type definitions themselves (in static or dynamic languages) are partly for the benefit of the compiler, and partly for the specification of runtime behavior.  Explicit testing and runtime manipulation of types (metaprogramming) uses types as part of the runtime behavior of the program. Virtual dispatch uses type information to determine runtime behavior. But non-virtual dispatch is really just a hint to the compiler about what code is going to be associated with what data. The behavior of such code is wired down at compile time.  The compiler uses it to optimize, and report programming errors back to the user.

The way that instructions are provided to the compiler should be rethought.  The declarative style of such instructions should be retained, but the functionality should be extensible through code that’s integrated with the project code. If I don’t like the way the static type system works (as supplied by the environment), I should be able to write extensions to it that will be executed by the compiler when it compiles my code. Among other benefits, this would allow me to implement better Domain Specific Languages and add better support for static analysis tools.  Moving the language complexity associated with static typing into a user-extensible library would also streamline the core language specification. The implementation of this feature would be more natural in a language where the compiler could just as easily interpret code as compile it, like dynamic interpreted languages.

Documentation: Embedding chunks of documentation inside your source code is a good start, and extracting method signatures is also useful (ala javadoc).  But a truly integrated system could provide much more information about interface specifications, preconditions, postconditions, etc.

Interface specification: If a public function takes arguments including a list and an integer that must be less than or equal to the length of the list, how does the author encode that information into the source code?  They can put it in comments.  They can add an assert statement (which will likely be ignored by the compiler, optimizer, documentation system etc). They can use TDD to create a test case that ensures the module throws an exception if the precondition is violated.  None of that goes far enough.  This kind of specification needs to be supported directly by the programming language and tied into the other facets of programming.

Module definition: The source code structures used to create a piece of software (a reusable module) are often not the same structures that you want to use to control how that software is used by other components.  That’s why programming languages support Classes and instances for object oriented design, and also support some concept of modules or packages for controlling the import and export of software interfaces to other components.  In most cases, this module/package support is a very thin layer glued onto the outside of a programming language.  For example, when creating a shared library on UNIX, there are linker-specific ways to enumerate which symbols are visible to consumers of the library.  There are also platform-specific hooks to allow this information to be passed into the linker from the source code, but again, it’s not truly integrated into the programming language.

Optimization: The author of a component usually needs to concern themselves (at some level) with basic choices that affect performance.  In some cases this requires in-depth bit-twiddling and care selection of compiler options, but in some cases it just means choosing data structure implementations that are appropriate for the task at hand.  As an author, I don’t want to have to use the Makefile to assign different optimization levels to different source files. I’d like to declare that a particular chunk of code needs to be heavily optimized and have the compiler just do the right thing.

Binding: Two kinds of binding are important to me as an author. My component will need to bind to the implementations of the interfaces it needs to be complete. Also, other components will bind to my component. The way I facilitate external binding is really part of the “module definition” facet discussed above.  The binding facet concerns itself with how to control the way that a component binds to its own required components. In some cases you want this binding to be via copy inclusion (think of archive libraries or combining .class files into a .jar file). In some cases you want to bind to external component, and include a version dependency in your component’s binary representation. In my source code I can say “import webservices.securesocket”, should I really have to go over to the makefile or build.xml in order to specify which version of the package I need to depend on?  A lot of what currently needs to go in the build structures should be folded into a new style of “rich” source code.

Static analysis: There’s a very useful development cycle in statically typed languages where you iterate between the compiler and editor while the compiler tells you about static typing errors in your source code.  But there are other static analysis tools besides the compiler.  The “compiler instructions” programming facet that I described above should be extended to include giving instructions to multiple static analysis tools in a unified way. Alternatively, the compiler can be extended as a universal front-end to outside analysis tools. Either way, this facet of programming should be expanded.

With this understanding as a basis, the next exercise would be to define a streamlined language that could be used as the basis for this kind of modern authoring system.

The next big concurrent language

Wednesday, October 7th, 2009

Tim Bray has been writing his thoughts recently on the topic of the next big language for concurrency.

Let me start by saying I’m completely an arm-chair quarterback here, I’ve never used a functional language for a real project, but I’ve worked in the area of development tools and multi-threaded applications for many years. I’ve watched a lot of language extensions and libraries come and go over the years claiming to “solve” the problems of parallel coding.  I’ve also seen some pretty fancy analysis done by compilers, which seems like it could be better leveraged if the languages had better ways to communicate the intent of the code author.

Do we need a new language for concurrency?

My assumption is that the development of OO programming was the result of increasing software complexity, not a response to a particular change in the capabilities of computer hardware (like multi-core is hitting us today).

Over the last many years, distributed networks of computers have been a very popular platform for writing software.  There are many libraries and frameworks for dealing with distributed systems, but specific languages that address that need (I’m sure there are some) have not become popular in general.  Distributed programs are still written mostly in languages that don’t have any specific features to support distributed programming.

So I don’t think the need for concurrency itself will drive the adoption of a new language.  The compelling argument for me is the possibility that the needs of multi-core (and hence multi-threaded) programming may drive software complexity far enough that we need another big leap in programming technology.  I’m skeptical that’s the case, but it won’t stop me from theorizing about what the next leap in programming technology might be.  🙂

Eventually we’ll need a new language, what will it be like?

Global state makes multi-threaded programs difficult to write and maintain because it requires synchronization.  The problem is not the synchronization, the problem is the global state. That’s a lesson I take from functional languages. Previous attempts to address concurrent programming tended to focus on encapsulating the synchronization, instead of encapsulating (or abstracting away) the global state.  For example, a parallel language extension that allows you to specify a matrix operation without needing to name any iterator variables has abstracted away the global state (in this case, the iterator variable).  A language feature (like OpenMP) that tells the compiler where and how to synchronize, but still requires you to code the iteration yourself, is hiding the synchronization but not the the global state.

It’s tempting to look for a language that emphasizes functional programming in its design, but that’s not necessarily the right approach.  There are many difficult and complex aspects to writing software, and the problem of global state is just one more. The right response is to look for a language that makes it easy to encapsulate global state.  In general, functional languages don’t necessarily make it easy to encapsulate global state. I think the correct response is to look at languages that can abstract away the complexity of MT coding and global state in effective ways.

So here’s an analogy: A long time ago I took a course on the software APIs used to internationalize software.  My main take-away was the the best way to internationalize any sort of library was to remove all the messages from the library, and only return diagnostic codes.  In other words, the best way to internationalize code is not to have to.  Similarly, the best way to synchronize code is not to have to. You want to encapsulate it into a specialized module.

In a layered application that is concurrent, you want to focus on making the lowest layers of software completely reentrant (that is, side-effect free). It’s generally not a big deal if the very top layer has global state, as long as the top layer is sufficiently thin. You want a language that makes it easy to write side-effect free code, but there is still lots of code that needs to have side-effects that can’t be ignored.

So it seems to me that the key feature we should be looking for is a significantly increased ability to abstract away implementation details.  By the way, any functional language that requires coders to understand how/why/when to write a tail-recursive function loses out big time in the abstraction department.

I was recently inspired by a paper in IEEE Computer that talked about a research language called SequenceL. I discussed it in a previous blog. The benefits of SequenceL are described as the ability to write executable code that maps as directly as possible into a requirements specification in the jargon of the problem domain.  This meshes with the recent discussions of DSLs (domain specific languages) as a good way to encapsulate various concurrent implementations.

Check out my last blog entry about SequenceL, and read the paper, it’s very well written.  If you have a direct link for it, please let me know.

SequenceL : Declarative Language Design

Wednesday, September 16th, 2009

I faithfully scan the tables of contents for IEEE Computer every time it comes out, and every now and then there’s a paper that I find both interesting and well written.  Today I found one called: “Taking Parnas’s Principles to the Next Level: Declarative Language Design” by Daniel E. Cooke and J. Nelson Rushton, at Texas Tech University.  I’d never heard of Parnas, but the paper discusses a programming language called SequenceL, which addresses some of the issue I have with programming language design.  Specifically, both functional and procedural languages make it necessary to “wire down” an implementation too much, and don’t allow enough leeway for an intelligent compiler or an intelligent library to offer a generic software service.  The language discussed in the paper revolves around an increased reliance on declarative syntax to specify a richer set of functionality. You might also describe the approach as heavily polymorphic, but I don’t think that description really captures the elegance apparent in the examples.

The emphasis in the paper seems to be on the ability to create an algorithm implementation that is very close to a natural language specification of the requirements.  In fields where that’s an issue (space shuttle software, in the article) that seems like a good measure of success. But I think the increased level of data hiding and abstraction also lends itself to supporting much more freedom in the compiler, libraries and runtime system to choose implementation details that are best suited to executing the program on particular hardware or in a particular operating environment. Unsurprisingly, when I googled for SequenceL I also found papers with names like: “Automatic parallel control structures in SequenceL” and “Automatic Concurrency in SequenceL”.

Anyway, it was a very stimulating paper. Unfortunately, it’s a restricted download on the IEEE site. If you’re a Sun employee, you can access the IEEE site without a special login, the instructions are available on the Sun internal network at the Sun Learning Services web site. I assume you can also purchase it online from the IEEE site, but I didn’t verify that.

Update: Very interesting.  Two days after I read this paper, I scanned through some slides being presented at Sun by Kunle Stanford Pervasive Parallelism Laboratory.  He discussed the future of highly parallel app development, and his vision includes more emphasis on Domain Specific Languages, using Scala as a base platform for the DSLs.  More support for the idea that current programming languages require you to put too much implementation knowledge into the application level. You can find a set of slides very similar to what he presented on the Scala web site.

C++ and OpenMP runtime libraries in Solaris

Friday, February 20th, 2009

There is a set of runtime libraries that are maintained by the compiler team, and delivered to Solaris so they can be made available in /usr/lib on all Solaris systems. The ones most likely to affect people are the OpenMP support library (libmtsk) and the C++ runtime libraries (libCstd, libCrun). What makes these libraries special is that they share interfaces with the compilers. The compilers implement language features by automatically inserting function calls to these libraries. This means that new features in the compilers can require new versions of these libraries to be installed in /usr/lib.

Updating the libraries The long-standing process for this is that users are expected to install the latest runtime library patches when they want to use a new release of the compilers, or when they want to run executables that were built with the latest release of the compiler.

This procedure has some problems.

1) The lead time, from last bug fix, to getting a Solaris patch available on sunsolve.com can be several months. The compiler team never knows when the last bug will be fixed in a compiler release, and they don’t know if that bug will require a library fix or not.

2) Hunting down the right runtime library patches, and installing them on all the machines that need them requires significant time and system administration skills.

Note: On Linux, we always have to include our own copies of the runtime libraries. That’s an added complication, but I’m discussing the problems related to skew between the compilers and the Solaris installation, and on Linux there is never any skew.

SPROtweak

We lived with these difficulties when our releases were 12-18 months apart, but we’ve been doing more frequent Express releases for several years now, and so we’ve been relying more and more on a workaround. Internally, we call that workaround SPROtweak after the name of the package that implements it. The SPROtweak package contains copies of all runtime libraries that the compiler team delivers to Solaris. We install this package inside the compiler install directory when we want that compiler to override the system library versions. Our current policy is to install SPROtweak for Express releases, but not for FCS releases. This workaround also has several problems:

1) Programs built with Express compilers depend on the compiler installation directory. If you run the executable on a different computer, it will try to fall back to the system libraries, but it may not work correctly.

2) Official Solaris patches to support Express releases may or may not be available in a timely fashion, so executables are only moderately portable to different Solaris hosts.

3) We use this workaround as a crutch inside Sun. We commonly install the SPROtweak package even in FCS versions of the compilers inside Sun, because we don’t want our users to have to install patches. Since the compilers inside Sun are often used from an NFS mounted directory, we can usually avoid version skew. But we’re essentially saying that we don’t want to deal with patches, but we’re willing to make end users deal with patches.

Sometimes we make a distinction between casual users and production users. The internal installations of the compilers are for casual users, so we don’t want to make them deal with patches. Production users will normally be expected to have their own installation of the tools, and have closer control over the patch levels of their build machines. Production users will have a sys-admin to maintain their build machines. This side-steps the issue of the many casual users that don’t work for Sun.

Are there any better solutions?

A) Always include the latest runtime libraries inside the compiler directory, and give users a simple command line option to use the system libraries or the local copies when creating an executable. This is fairly straightforward, but it doesn’t go very far towards creating a seamless solution.

B) Fix the Solaris patch process so it takes less time to get these fixes available in Solaris. This would make the situation on OpenSolaris pretty good, because availability is 99% of the problem. Updating the runtime libraries is much simpler (although you still have root permissions). But the situation on Solaris 10 is only half addressed, because it’s still a PITA to install the patches.

GCC?

I’d be interested in hearing the different ways that “version skew” is dealt with on Linux.

Future

In the long run, the way we will address this skew may be different on OpenSolaris and on Solaris 10 because the distribution mechanism is different.

Goodbye Solaris 9 (for Sun Studio)

Wednesday, August 20th, 2008

We’re making the internal transition to building Sun Studio on Solaris 10 (instead of Solaris 9). This is a big deal because the product bits immediately become useless on any Solaris 9 system. There’s a new libm.so.2 library that became available on Solaris 10, and if you depend on it, you can’t run on Solaris 9. It’s a challenge making sure our vast ocean of loosely maintained lab machines is ready for the change. The good news is we get to use newer, faster hardware. 🙂

I’ll make this post short because I’m using ScribeFire for the first time in forever, and I don’t trust it. I can’t believe blogging is still this hard. 🙁

Latest Sun dwarf extensions

Friday, May 11th, 2007

I’ve been working with the Sun lawyers and the Dwarf Standards Committee recently to change the overly zealous license on the Sun Dwarf Extensions document.  I think we’ve finally gotten it down to something reasonable.  Anyway, we’ve added a few twists for C++ and Fortran 90.  As an example, there are some new structures for identifying code segments that implement C++ destructors, and correlating them to the object being destroyed. So with that introduction, here is the latest document.

What do you call 64-bits?

Saturday, April 7th, 2007

You can use Sun Studio to build 32-bit programs and 64-bit programs.  So what is that choice called, anyway?  Is it “data model”, “memory model”, “data addressing model” ?  Is it an ABI?  And how is it related to those compilation styles on x86 where you can have large/small/medium/tiny memory models?  Is that a “memory model” too or a “code model”?  Our doc writer Richard started asking those questions, and Wikipedia wasn’t really that helpful in the terminology department. Are there any other good sources for choosing a terminology?  Check out Richard’s blog for more details.

Alan Burlison responded with:

ILP32 and LP64

See http://www.unix.org/version2/whatsnew/lp64_wp.html

Thanks for the pointer, Alan. They call it “model” and “programming model” most of the time. But that page also uses the terms “pointer model” and “data model” (once each).

Compiler options by category and platform

Saturday, April 7th, 2007

One of our doc writers, Richard, blogged about a project I helped out with.  It’s a nice index of compiler options sorted and arranged by many different aspects.  The source is stored in an XML document, and it’s processed using style sheets into a tree of HTML files.  We tried forever to  dynamically load the XML file and apply different style sheets to it, but we couldn’t get around the old  “can’t load foreign files” problem in html and javascript.  I kind of hacked around this in my bookmarks page, which I wrote a while back as a Javascript learning experience. The amount of pain and hackery that I needed to get that working was terrible.  The existing set of pages get generated by Kami or I from a shell script, but then they need to be “fiddled with” to meet the style criteria for developers.sun.com.  That process can sometimes take forever depending on the workload of the people who have to do the fiddling, and the process will have to be done the next time we post an update, unless we change our setup.   In the next version of the pages, I’m hoping we can set it all up in one large XHTML file, and use some Javascript menus to choose a style sheet to be applied dynamically.  We’ll still probably have to tweak the new page (once!) to meet the right style guides, but from then on, it should be easy to keep it up to date.  If the developers.sun.com web site doesn’t want Javascript running loose, we can post the index someplace else.  So if you have any feedback on the structure or contents, let me know.  But don’t get me started on deep-linking into the manuals on docs.sun.com, that was a huge hack to get working, and it’s not officially supported.

Update: I updated the link for the Sun Studio 12 compilers.