Archive for the ‘Tech Trivia’ Category

Stupid UNIX Tricks #1 : LANG and shell scripts

Saturday, April 14th, 2012

If you’ve been using UNIX systems for a while (including Mac OS X, Linux or anything else remotely similar) you might know about the LANG environment variable.  It’s used to select how your computer treats language-specific features.  You can find out more than you ever wanted to know by looking here: http://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap08.html

Mostly it doesn’t make much difference in your life, except there are two commonly used default settings.  One common setting is LANG=C which enables some very old-fashioned standard-conforming details and allows an implementation to skip lots of fancy language processing code.  Another common setting is LANG=en_US.UTF-8.  That setting tells the various system functions in libc to expect strings to be in a rich text format.

On the systems I use, it seems like the default is en_US.UTF-8.  But I suspect that most people must have LANG=C somewhere in their 20-year old .login files, because I occasionally run into bugs where some script doesn’t work right unless you have LANG=C.

Here’s an example:


% mkdir test; cd test; touch Caa cbb
% export LANG=C
% echo [c-d]*
cbb
% export LANG=en_US.UTF-8
% echo [c-d]*
Caa cbb

So the range of characters from ‘c’ to ‘d’ includes the letter ‘C’ if you are in the en_US.UTF-8 locale.  Ugh.  It’s easy to get that wrong in your shell script someplace, and people do.

Here’s an easier way to show why that happens:


% mkdir test1; cd test1; touch a A b B c C;
% export LANG=C
% ls
A  B  C  a  b  c
% export LANG=en_US.UTF-8
% ls
a  A  b  B  c  C

So you can see the sort order of strings used by the ls command matches the character order that the shell uses to expand the character range construct of glob regular expressions.  I suppose it’s consistent.  But it’s one of the things that makes it a challenge to write shell scripts that are robust and portable to different user’s environments.

Table Based Collaboration

Monday, January 23rd, 2012

I’ve been supporting our department wiki for many years now.  The most used feature is basic rich text, as you would expect, but the next most popular feature is tables.

Over time, I’ve identified a particular kind of collaborative function that people engage in when they are coordinating activities.  I don’t have a good name for activity, but I’ll call it “table based collaboration”.

In some corporate cultures this is till done by sending giant Excel spreadsheets around as email attachments.  This is the main option available when collab services (like wikis) are not available or practical for all the participants.  In this model, the owner of the process owns the spreadsheet makes the updates based on email received from the participants.

A more collaborative approach uses any form of wiki to create tables on wiki pages.  Depending on the data, it’s also possible to use a bulleted list format instead of a table, but the data itself and collaboration process is the same.  There’s a big jump in ease of use if the wiki supports rich text table editing.

I’ve tried about a half-dozen different wiki implementations of tables, and none of the rich text table editors are worth using in a produciton environment.  As soon as you do any kind of formatting, the entire table converts from wiki-syntax to raw html.  And after that point, the first formatting bug (that can’t be fixed by the rich text editor) becomes impossible to correct by direct-editing.

The lack of decent rich text table editing means that you need to stick with the wiki-syntax for tables, and edit them by hand.  This is workable, but forces the participants to have passable fluency in the wiki syntax and whatever foibles it has.

Another way of enacting table based collaboration is to use an actual database with a simple web interface.  We have several examples of this in my organization.  It’s generally implemented using an off-the-shelf database of some kind.  By definition, the table never needs to be joined with anything, and there’s only a single table.  If your “table based collaboration” sprouts any extra tables, then it turns into a “department web application” and it falls outside the realm of this discussion.

There are an endless supply of web application frameworks which have an simple process for creating a simple web app.  But the process of creating it still requires the owner of the process to learn the framework and generate the web app.  It also requires someone to set up and maintain the web application itself.  These solutions are not suitable for having a non-web-technical person set up a new table.

If you look at each of these mechanisms, each one has pros and cons.  Factors to look at are: 1) Does it require centralized infrastructure? 2) What are the platform/tool requirements placed on the participants?  The leader?

In the final analysis, I think something like a Google Docs spreadsheet provides a sweet spot of accessibility, formating and overhead.  Unfortunately, it’s not appropriate for a department-level solution.  Using Google Apps for proprietary company data needs to be approved as a company-wide policy, you can’t just download it and start using it.  Approving it for use for company business is appropriate for some companies, and not for others.

What I’ve been looking for is a web-application that allows end users to define a set of columns using basic types (string, date, enumeration, etc) and provides a simple spreadsheet-like interface for adding/removing/modifying data.

I’ve been so frustrated recently that I’ve been thinking about recommending that people go back to mailing around OpenOffice spreadsheets.  Some general purpose wikis get by with less-than-ideal behavior when two people make updates at the same time. So, in some cases the collaborative aspect of the solution (like wiki tables) costs more in synchronization headaches than what it would cost to have one person do all the updates.

 

The everpresent “util” module.

Sunday, September 12th, 2010

Every major library or application I write seems to have a module named “util” these days.  I think it represents a kind of “impedance mismatch” between the platform I’m using (C runtime, C++ runtime, python standard libraries) and the platform I *wish* I were using.

Recently, I’ve been writing python code that runs lots of little UNIX utilities.  You know, like: find, ls, chmod, etc, etc.  It’s the kind of code that might also be written as a shell script, but python is much nicer when the program gets larger than about a page.  If you’re running lots of utilities, you want a variety of ways to interact with them.

Sometimes, you don’t want to send it any input, sometimes you do, sometimes you are expecting one line of output.  Sometimes you’re expecting a list of lines.  Sometimes you’re going to check the return code, sometimes you’re not.  These functions are all just small wrappers around calls to the python subprocess module in python.  But if you’re writing a lot of code that uses them, it’s important to make that code readable, so you want to streamline away most of the goop for dealing with the subprocess module.

I have utility routines for creating temporary files and removing them all when the program exits. There are routines to keep me from adding a lot of obscure import statements to the top of most of my modules.

Here’s some examples of what I’m using for now:

def gethostname():
   from socket import gethostname
   return gethostname()

def timestamp():
   import datetime
   return str(datetime.datetime.today())

Here’s a recipe that I got from stackoverflow.com.  I wanted the equivalent of “mkdir -p”, and you need a few lines to do that in python.

def mkdir_p(dir):
  import errno
  try:
    os.makedirs(dir)
  except OSError, exc:
    if exc.errno == errno.EEXIST:
      pass
    else:
      raise

There’s also code to do things that I’m convinced must have a better answer in python, but I haven’t found it yet.  So I isolate the hack to the until module.

def is_executable(file):
  S_IEXEC = 00100
  mode = os.stat(file).st_mode
  return mode & S_IEXEC

Moving code in and out of my util module also prevents me from worrying so much about obscure modularity issues. Any code I don’t want to worry about today goes into the util module. When I know where it belongs, I can easily move it later. Of course, that’s much easier to do with python than in a language that uses header files like C or C++.

Virtualization terms

Wednesday, June 16th, 2010

Update: A newer version of this post (find it here) was recently created. 2

Okay, before I forget, I’m writing it all down.

We have to test against all this stuff, and it’s becoming more and more convenient to use virtualization as a way to share lab resources, so I figured I’d go make sense of all the terminology that’s flying around.  I understood 80% of it, but I could never understand all of it at once.  A lot of this was extracted from Wikipedia.

Here are the things that affect my life: Xen, VirtualBox, VMWare, LDOMs, Zones, Containers.

Hypervisor : Software that emulates a hardware platform, so that Operating Systems can run on top of it, as if they had hardware to run on.

OS Virtualization: When you have one OS (one kernel) running multiple user-spaces. Applications think they are on separate machines.

There are two kinds of Hypervisors, some run directly on hardware (Type 1), and some run as applications (Type 2).

With those terms defined, here is a description of the technologies, features, products that I listed at the top:

  • Hypervisors:
    • Running on hardware – Type 1 Hypervisor
      • Xen: Hypervisor that runs on hardware, supports x86 (aka Sun xVM)
      • LDOMs: Hypervisor that runs on hardware, supports SPARC
    • Running as an application – Type 2 Hypervisor
      • VirtualBox: Hypervisor that runs as an application, supports x86
      • VMWare: Hypervisor that runs as an application, supports x86
  • OS Virtualization
    • Solaris Containers/Zones

The terms “zone” and “container” seem to interchangeable. I have not found a source that is both clear and authoritative that can tell me the difference.

Zones are capable of running different versions of Solaris inside one Global OS instance.

There are lots of things I glossed over here, but my goal was keep it short and sweet.

Trivia:

  • You can run a specific old version of Linux inside a Solaris zone.
  • The VMWare company probably supports products on other chips than x86
  • There are lots of differences between the features of Xen and LDOMs that I didn’t discuss

OpenOffice loses this round

Thursday, July 16th, 2009

I use spreadsheets every now and then for pretty trivial things.  Recently I’ve been using google docs spreadsheets because they were online and editable from different locations easily.  A few days ago I tried to use OpenOffice for a fairly simple sheet.  I’ve used OpenOffice on and off for years and years without ever becoming a power user.  After 30 minuets of trying to work with my very simple data, I realized I’d spent 28 minutes trying to figure out how to do basic operations that I took for granted in google spreadsheets. So here are the first few things I tried to do that were not as simple as they need to be:

1) Create a header row.

In OpenOffice, this is a “Window” option, and you find it under “Window -> Freeze”.  In google, I don’t even remember doing it, I think it just happened automatically somehow. (Addition: Even after using Window->Freeze, when sorting you still have to check the hidden box “Range contains column headers”)

2) sort rows by the value in a chosen column.

In google docs, when you hover over a column header, you get a pull down arrow that lets you choose A-Z or Z-A. That’s all I’ve ever wanted to do.  In OpenOffice, there is a prominent A-Z icon in the toolbar which does something stupid. (Sort the selected column regardless of other data).  The sort rows feature is under “Data -> Sort” and brings up a popup to configure the sort.  More than I needed.

3) reorder columns

In google docs I just drag a column left or right where I want it.  In OpenOffice the only way I found was to copy the data out of column, add a new column, and paste the data into the new column.

It would seem that spreadsheets are for manipulating tables of data, and it seems that there are many more small tables in the universe than large tables.  So why not optimize for quick and simple operations that casual users do all the time?

I guess I’ll stick with google docs for now.

Mac OS X — Dock review

Friday, May 22nd, 2009

I’ve been using Mac OS X 10.5 (Leopard) for a week or two as my main desktop environment, and I’m really liking the Dock for icons and such.  For the last 20 years, I’ve wanted a window manager that combined the quick-launch buttons with the running program icons.  I’ve finally gotten my wish.  But after using it a while, I think there are some rough edges.

Here’s my version of an overview of the Mac OS X Dock:

Icons are used to represent several kinds of objects. On the left side of the dock are objects that represent applications.  On the right side of the dock are several distinct kinds of objects. 1) The trash can, 2) folders, 3) iconified windows

Application objects represent shortcuts for starting an app, if it’s not already running.  If the app is already running, there’s a small visual indicator next to the icon, and clicking it brings up one of it’s windows (the main one, the last you had focus in? I’m not sure).  But the application’s windows also show up on the right hand side of the dock.  When you click on a folder icon, you get a very nice pop-out menu with icons for each object inside, very convenient!  When you click on the icon for a running app, AND that app has more than one window, you should get a pop-out menu letting you choose which window you want to select.  It seems like a no-brainer to me, it just makes the interface more consistent.  And the dock would also quit jumping around so much and jiggling left and right as you open and close windows.

I’ll keep my fingers crossed for Snow Leopard, the next version of OS X.

Twitter needs to be commoditized.

Saturday, April 25th, 2009

Twitter needs to be commoditized. What do I mean by that? I mean that the Twitter message streams need to interoperate with all my other message streams. Twitter is just a bunch of logical message streams from different people. I don’t really care if my messages are coming via twitter or RSS or IM.  Why?  I’ll tell you.

I variously use OpenSolaris, MacOS and Windows most every day, and Firefox/Thunderbird/OpenOffice is my common app platform.  So I’m using TwitterFox to keep up with twitter.  It’s very good as an entry level Twitter client, but now I’m tempted to use something I can customize a little more.  But I’ve already got daily messages coming through several other interfaces, and I don’t want another application.  All I want is access to the twitter message streams.

But wait, you say, twitter is different because you can read and respond instantaneously!  And it’s a multi-way conversation! And it’s limited to 140 characters! But is it really that different at heart from what’s come before? Thunderbird has little popup windows for new mail, and people frequently use email for nigh-instantaneous conversations.  Both IM and IRC are instantaneous and they support multi-way conversations. Why haven’t I heard more about IM and IRC gateways with Twitter? The vast majority of my IM and IRC messages are less than 140 characters, nothing new about that.

In my opinion, the defining feature of twitter is that the clients provide an all-in-one chatroom interface as the primary way of viewing the data, but you get to easily choose who’s in the chatroom.  That’s a feature that should already exist in IRC anyway, it’s just too painful to use in IRC clients.  Because twitter is frequently updated, it grabs people’s attention.  Because it grabs their attention, interactive conversations are facilitated.  So that’s the essence of Twitter: It’s a global chatroom where you subscribe to the people you want in it.  But that’s just a kind of user interface, it’s not inherent to the data feed.

Some of the people I follow on Twitter provide good technical tips and pointers. Some of them are personal friends, some post links to “cool stuff”.  Some of them post frequently, some of them post infrequently. Hmmmm, this is sounding like a breakdown of my various email-based filtered inboxes, and RSS reader tags, and my IM contact categories.

The message clients I use most these days are:

  • Cellphone SMS
  • RSS via Google reader (I use multiple computers remember)
  • gmail (for personal email)
  • thunderbird (for work email)
  • Pidgin (IM, multiple accounts, work and personal servers, some IRC)

So why do I need another one?  The ones with the best features for managing message streams are gmail and any RSS reader. What I’d really like is one application that can manage all those message streams for me, and cross link them.  Anyone want to write me one?

For my own purposes, it would be easiest if this application was a program that could be run as a hosted service.  That makes it easy for it to be cross platform, like Google Reader.  But I’m not supposed to access work email except from approved sources, so having an app server read my work email for me is out. For that reason a complete solution would probably need to be a client-based app.

I spend much more hands-on time reading that I do writing.  So I’m prepared to completely blow off the integrated message creation parts, I’m just talking about reading here. It can just just bring up Thunderbird to send email, or bring up twitter.com to update my twitter feed. The app would need to be able to read and correlate all the message stream technologies I’ve mentioned so far, and allow me to sort and group the various messages streams mixed together.  I have a “friends” folder in my work email that has a small number social emails.  I’d like that one folder from my work IMAP to be grouped with all my personal gmail folders.  I’d like to have views based on people, so that I can see all the messages streams from my buddy Ken, regardless of where they came from (IM, GMail, IMAP, Twitter, and don’t forget SMS and IRC).  I don’t need it to connect all his accounts together, I can configure that.

Some of the message streams are things I’d like to promote to “pop-up” status, so a browser add-on component that talks to the client would be nice.  (Or just use the desktop native pop-up mechanism).

I’m subscribed to fair number of high volume email lists at work, and I filter them off into separate email inboxes.  This works ok, but I’d really rather be reading those in an RSS reader, not an email app.  The user interface is structured in a more appropriate way in RSS readers.

Oh, and don’t forget NNTP.  I don’t use any NNTP streams right now because they require yet another client.  Even using thunderbird for NNTP pulls up a completely separate UI mode in thunderbird.  I’d totally love it if thunderbird had kill-files for IMAP messages, but it doesn’t yet.  By kill-files I mean: “type K to automatically junk all future emails in this thread”.  I don’t mean:  Set up a special filter with a special window and select subject line, and copy/paste the subject line, and remember to go back and prune your old filters, and remember to apply the filter to the specific folder you’re looking at.

In my head it’s a very simple interface, you just zoom in and zoom out on your message streams.  If you zoom all the way in on one blog post, you get a stream starting with the original post, and followed by all the comments. If you zoom out, you’ll see all the posts in the blog, but none of the comments.  Zoom out again, and you see a sample of all the posts in that category of your RSS reader.  The organization is a tree, but it’s heavily cross-linked.  Message streams show up in more than one place. I can start at the top of “all work email”, then drill down to my “work/social” folder, then go sideways to all “social” streams, then drill down to a thread with Ken, then drill sideways (eg by clicking on Ken’s name) to all message streams where Ken participates, etc, etc. Nodes in the tree are automatically created according to the structure of the underlying sources, but I get to create additional nodes that combine the data from other preexisting nodes. I can also create additional nodes by creating keyword search of filters on existing nodes.

Oh well, one day when I retire I’ll get a chance to work on it.  Until then, I’ll just keep bitching.  Someone please get cracking on this.  🙂 If it works right I’d pay a lot of money for it.

OpenID starting to take off (finally)

Sunday, December 24th, 2006

I found this on del.icio.us/popular: A video showing how to use OpenID to get a portable login that you can use with many different web sites.  One password, controlled from one spot. And you can get your free login identity from multiple different web sites offering the OpenID service.  Check it out.  Back to your holiday entertainment….