I’ve been thinking recently about the fact that the average piece of software code includes instructions to the compiler mixed together with instructions that should be executed at runtime.  Type declarations are instructions to the compiler. Most of the general sequential code is instructions that should be executed at runtime.  It occurred to me these are just two of many facets of software.  It would be nice to enable all these facets to be mixed together into one document so that the author of the software can keep all the facets consistent.

Another facet is specifications or unit tests.  I group those together because the way they’re tied to software at the code level is very similar, and they serve similar purposes.  There is an approach to coding called “Test Driven Design” where unit tests are written simultaneously with individual chunks of code.  There is a variant of this called “Behavior Driven Design”.  I was exposed to BDD in the latest Scala book (Programming Scala) and that was when I realized the TDD is really about verifiable specification, not so much about testing.

I really don’t want to use something that’s just a “programming language”, I want to use a “Software Authoring System”.

So what are the facets that a good “Software Authoring System” needs?

Runtime instructions: The purest expression of this facet is in dynamically typed languages, because they omit static type declarations.

Compiler instructions: Static type declarations for variables are instructions to the compiler. Type definitions themselves (in static or dynamic languages) are partly for the benefit of the compiler, and partly for the specification of runtime behavior.  Explicit testing and runtime manipulation of types (metaprogramming) uses types as part of the runtime behavior of the program. Virtual dispatch uses type information to determine runtime behavior. But non-virtual dispatch is really just a hint to the compiler about what code is going to be associated with what data. The behavior of such code is wired down at compile time.  The compiler uses it to optimize, and report programming errors back to the user.

The way that instructions are provided to the compiler should be rethought.  The declarative style of such instructions should be retained, but the functionality should be extensible through code that’s integrated with the project code. If I don’t like the way the static type system works (as supplied by the environment), I should be able to write extensions to it that will be executed by the compiler when it compiles my code. Among other benefits, this would allow me to implement better Domain Specific Languages and add better support for static analysis tools.  Moving the language complexity associated with static typing into a user-extensible library would also streamline the core language specification. The implementation of this feature would be more natural in a language where the compiler could just as easily interpret code as compile it, like dynamic interpreted languages.

Documentation: Embedding chunks of documentation inside your source code is a good start, and extracting method signatures is also useful (ala javadoc).  But a truly integrated system could provide much more information about interface specifications, preconditions, postconditions, etc.

Interface specification: If a public function takes arguments including a list and an integer that must be less than or equal to the length of the list, how does the author encode that information into the source code?  They can put it in comments.  They can add an assert statement (which will likely be ignored by the compiler, optimizer, documentation system etc). They can use TDD to create a test case that ensures the module throws an exception if the precondition is violated.  None of that goes far enough.  This kind of specification needs to be supported directly by the programming language and tied into the other facets of programming.

Module definition: The source code structures used to create a piece of software (a reusable module) are often not the same structures that you want to use to control how that software is used by other components.  That’s why programming languages support Classes and instances for object oriented design, and also support some concept of modules or packages for controlling the import and export of software interfaces to other components.  In most cases, this module/package support is a very thin layer glued onto the outside of a programming language.  For example, when creating a shared library on UNIX, there are linker-specific ways to enumerate which symbols are visible to consumers of the library.  There are also platform-specific hooks to allow this information to be passed into the linker from the source code, but again, it’s not truly integrated into the programming language.

Optimization: The author of a component usually needs to concern themselves (at some level) with basic choices that affect performance.  In some cases this requires in-depth bit-twiddling and care selection of compiler options, but in some cases it just means choosing data structure implementations that are appropriate for the task at hand.  As an author, I don’t want to have to use the Makefile to assign different optimization levels to different source files. I’d like to declare that a particular chunk of code needs to be heavily optimized and have the compiler just do the right thing.

Binding: Two kinds of binding are important to me as an author. My component will need to bind to the implementations of the interfaces it needs to be complete. Also, other components will bind to my component. The way I facilitate external binding is really part of the “module definition” facet discussed above.  The binding facet concerns itself with how to control the way that a component binds to its own required components. In some cases you want this binding to be via copy inclusion (think of archive libraries or combining .class files into a .jar file). In some cases you want to bind to external component, and include a version dependency in your component’s binary representation. In my source code I can say “import webservices.securesocket”, should I really have to go over to the makefile or build.xml in order to specify which version of the package I need to depend on?  A lot of what currently needs to go in the build structures should be folded into a new style of “rich” source code.

Static analysis: There’s a very useful development cycle in statically typed languages where you iterate between the compiler and editor while the compiler tells you about static typing errors in your source code.  But there are other static analysis tools besides the compiler.  The “compiler instructions” programming facet that I described above should be extended to include giving instructions to multiple static analysis tools in a unified way. Alternatively, the compiler can be extended as a universal front-end to outside analysis tools. Either way, this facet of programming should be expanded.

With this understanding as a basis, the next exercise would be to define a streamlined language that could be used as the basis for this kind of modern authoring system.