Archive

Archive for November, 2008

Steve Yegge and (Guitar) Gearheads

November 30th, 2008 No comments

A recent StackOverflow podcast featured Steve Yegge. It’s great stuff. A perfect mix of technical wisdom and insider gossip. In one portion of the discussion, Steve and Joel are discussing script kiddies and Steve starts to make a analogy to young guitarists. Unfortunately, he’s either interrupted or diverted. So, I’m going to try to make his point for him which I’m sure he’ll appreciate.

So, in the guitar world, young players (and many older ones) tend to fixate on gear to the detriment of musical fundamentals.  They always know about the latest pedals, picks, strings, amps, and, of course, guitars. They buy this stuff and fiddle around with it, all the while ignoring the fact that they have terrible technique, terrible ears, or just bad taste.

I know this because not only have I seen it, I’ve done it myself. I spent a lot of time in high school messing with distortion pedals, buying books of tablature, switching picks and strings, and believing that I’d be better if I had a better guitar.  (In guitar circles this is commonly known as Gear Acquisition Syndrome, or GAS). What I really needed was to sit down with a guitar, a metronome, and blank staff paper for transcription. Way it goes. I eventually learned my lesson, but it’s much harder to learn in your twenties than your teens.

The analogy to programming is the tendency of programmers to flit from one bleeding edge technology (c++, java, agile, rails, dependency injection, ajax, …) to the next looking for a silver bullet rather than honing their basic programming skills. I’m all for applying libraries to make life easier, but frequently it just boils down to basic design skills which you can’t get from the getting started page of some new widget.

I do this in my programming life too … maybe that last system turned into spaghetti because I didn’t use JAX-B … or maybe such and such book will suddenly make writing concurrency a snap. It usually doesn’t work out that way. Only when I actually sit down and do a really honest post-mortem, focusing on design, algorithm selection, usability, testing do I actually end up a little further ahead on the next project.

So what’s my current guitar rig? I have a guitar (Gibson ES-137) plugged straight into an amp (Roland Cube 30). No pick and heavy flat-wound strings. That’s been my basic, happy configuration for several years now. No fuss, no muss. The best part is that I now know what a truly terrible guitaris I am … d’oh.

Categories: software engineering Tags:

Use enum to define JTable columns

November 26th, 2008 No comments

Last week while tediously defining another Swing TableModel, I had a little epiphany. Typically, I’d define column headers, types, etc with a list of integer constants, and some arrays:

public class MyTableModel extends AbstractTableModel
{
    private final int NAME_COLUMN = 0;
    private final int VALUE_COLUMN = 1;
 
    private final String NAMES[] = { "Name", "Value" };
    private final Class CLASSES[] = { String.class, Double.class };

    . . .

   public String getColumnName(int columnIndex)
   {
      return NAMES[columnIndex];
   }

   . . .
}

This code is pretty tedious to maintain. In particular, switching column order involves a bunch of changes that are easy to get wrong. How about this instead… use an enum to define the columns!!

public class MyTableModel extends AbstractTableModel
{
    private static enum Columns
    {
        Name(String.class), Value(Double.class);

        final Class klass;

        Columns(Class klass)
        {
             this.klass = klass;
        }
    }

    . . .

   public int getColumnCount() { return Columns.values().length; }

   public Class getColumnClass(int columnIndex)
   {
      return Columns.values()[columnIndex].klass;
   }

   public String getColumnName(int columnIndex)
   {
      return Columns.values()[columnIndex].toString();
   }

   . . .
}

Now rearranging column order just works. Furthermore, you can add whatever column-specific functionality you like as methods on the enum. I think this approach can be generified with an interface for the enum to implement and a new abstract table model base class that can handle all the boilerplate above (getColumnCount(), getColumnName(), etc). When I get around to trying it out, I’ll post an update.

Late breaking: Of course, a quick search reveals I’m not the first person to think of this. Typical.

Categories: java, swing Tags: ,

Nice intro to JRuby/Java integration

November 25th, 2008 1 comment

Mikio Braun has written a good summary on calling Java code from JRuby. A lot of this material is covered on the JRuby Wiki, but Mikio adds some much needed exposition. Thanks Mikio!

Categories: java, jruby Tags: ,

Porting Soar to Java or: How I Learned to Stop Worrying and Love Spaghetti (Part 1)

November 25th, 2008 No comments

I have recently just finished up a port of the Soar kernel from C/C++ to Java. The existing implementation is about 40,000 lines of C and C++ code accumulated over more than 15 years of development, mostly by grad students at the University of Michigan.   For such a long span of time, that’s not really that much code, but unlike a lot of “modern” object-oriented systems, every line does something important. There’s very little getter/setter boilerplate.

I won’t get into why I wanted to port it to Java here. That’s another story. Instead focus on the software process.

From the start, I had some basic principles to follow. First, this was to be a port, not a reimplementation from scratch. Although there is a manual, wiki, and forthcoming book on the Soar architecture, there is nothing that I would call a specification for the language or run-time. That is to say, as is so often the case, the code is the spec. And there’s enough code of a sufficiently intricate nature that there’s minimal hope of generating a spec for Soar in any reasonable amount of time. It’s intractable. Besides, I wanted to use jsoar in the near future and with two children under two, I’m not spending my spare time writing a spec.

Second, because there is already a community of researchers working on the kernel, I wanted the port to be as lossless as possible. If the entire structure of the kernel changes out from under them, the chances of adoption are low.

Third, I wanted to actually test the Soar kernel. Historically, testing on the kernel has been minimal aside from manually running a few test cases to make sure nothing crashes. Since no one wants to write unit tests for a big legacy system, I might as well write some while I’m porting.

Finally, I wanted to be fairly conservative with refactoring during the port. In taking a bottom up approach to the port, I never knew when a  refactoring choice would come back to haunt me.  Furthermore, one argument for Java is that it’s easy (or easier) to refactor, so I might as well refactor at the end when I have a full view of the system. I’ll post more on how this has turned out later.

So, from these principles, I started out from Rivendell for Mordor. The upshot of the first tenet is kind of the funniest. My approach to the port came down to the following procedure:

  1. Open a cpp file, say rete.cpp
  2. Gasp at the majesty of a 9000 line cpp file
  3. Create class Rete in Java
  4. One by one, paste functions from C++ into Java, changing pointer arrows to dots, ec, etc

Union Busting

That last step was a doozy. The first few days were essentially building up data structures in Java and required the most actual though. The Soar kernel makes heavy use of the old tagged union pattern. This generally maps pretty nicely to a type hierarchy with polymorphism, but there were a few stumbling blocks… In particular, there are a few Soar data structures that are “transmogrified” at run-time, i.e. their type is changed. Nice. In these cases, I settled for an “expanded union” where all of the unused data fields were null. This is the closest I could get in Java without some major refactoring of the code… and since I was taking a mostly bottom up approach to the port, I was never sure when a refactoring decision would bite me at some later point.

Here’s an example of such a beast. You can see my assertions desperately trying to ensure that the “union” was indeed behaving like a union.

Function Pointers

The rete algorithm implementation in Soar makes heavy use of tables of function pointers. Since Java lacks function pointers, I took the fairly tedious route of defining an interface and implementing it once for each function pointer.  Basically, I ended up with a bunch of inner classes, one for each function pointer. This worked fine.

The interesting part came when the port was finished and I started performance testing. It turns out that method calls from a nested class to a containing class are actually pretty slow when called in an inner loop due to an additional level of method calls generated by the compiler. So, in the end, for many function pointer tables, I ended up reverting to simple switch statements which gave me something like a 20% speedup on some tests.

Enumerations

Enumerations are one area where I really did some refactoring. I kept the names the same, but I almost always converted enum or #define constant lists to Java enumerations. The ordinal() method provides access to the original integer value, if needed, but I found that with EnumSet and EnumMap, this was rarely necessary. The improved typesafety and overall debuggability didn’t hurt either.

Macros

Another obvious area of difficulty is what to do with macros, especially “clever” macros. For example, here’s the insert_at_head_of_dll() macro from the C Soar kernel:

/* This macro cannot be easily converted to an inline function.
   Some additional changes are required.
*/
#define insert_at_head_of_dll(header,item,next_field_name,prev_field_name) { \
  ((item)->next_field_name) = (header) ; \
  ((item)->prev_field_name) = NIL ; \
  if (header) ((header)->prev_field_name) = (item) ; \
  (header) = (item) ; }

That’s fun, isn’t it?  I kept the comment (ca. 2003) for extra effect. This is a “generic” procedure for inserting an item at the head of a linked list.  You’ve got the list header, the item and then two special parameters. What do they do? They name the next and previous fields in the item structure used for the linked list. Why not fix the names so you can just define a C++ template? Or pass in the address of the next and previous links? There are probably a few reasons. One would be performance. When this code was written (early 90s) they were squeezing every bit of performance they could out of the code. More interesting though is why the names aren’t fixed. That’s easily answered with a look at the working memory element (WME) structure from the kernel:

typedef struct wme_struct {
  /* WARNING:  The next three fields (id,attr,value) MUST be consecutive--
     the rete code relies on this! */
  Symbol *id;
  Symbol *attr;
  Symbol *value;
  Bool acceptable;
  unsigned long timetag;
  unsigned long reference_count;
  struct wme_struct *rete_next, *rete_prev; /* used for dll of wmes in rete */
  struct right_mem_struct *right_mems;      /* used for dll of rm's it's in */
  struct token_struct *tokens;              /* dll of tokens in rete */
  struct wme_struct *next, *prev;           /* (see above) */
  struct preference_struct *preference;     /* pref. supporting it, or NIL */
  struct output_link_struct *output_link;   /* for top-state output commands */
  tc_number grounds_tc;                     /* for chunker use only */
  tc_number potentials_tc, locals_tc;
  struct preference_struct *chunker_bt_pref;

  /* REW: begin 09.15.96 */
  struct gds_struct *gds;
  struct wme_struct *gds_next, *gds_prev; /* used for dll of wmes in gds */
  /* REW: end   09.15.96 */

} wme;

Just this one datastructure is potentially part of 1, 2, 3 linked lists (see the X_next and X_prev fields?) and holds the head for 1, 2, 3, 4 linked lists. So, how can this be improved? If I can’t improve it in C, I doubt I can implement it in Java. Well, the next and previous pointers can at least be pushed down into a structure:

template <typename T> struct list_member
{
   T next;
   T prev;
};

which will allow us to do something like this:

typedef struct wme_struct {
  // ... snip ...
  list_member<struct wme_struct*> rete_next_prev; /* used for dll of wmes in rete */
  struct right_mem_struct *right_mems;      /* used for dll of rm's it's in */
  struct token_struct *tokens;              /* dll of tokens in rete */
  list_member<struct wme_struct> next_prev;           /* (see above) */

  // ... snip ...

  struct wme_struct *gds_next, *gds_prev; /* used for dll of wmes in gds */
} wme;

Now we have actual object we can work with and write a single insert_at_head_of_dll() function, not macro. And it will be typesafe… but, we’re porting to Java here, and my number one goal is to port this thing, and my number two goal is to avoid manually expanding that linked list management code over and over again in Java.

My solution is encapsulated in two generic classes, ListHead and AsListItem. The former represents the head of a list, the latter an entry in the linked list equivalent to list_member<T> above. AsListItem came about when I was starting to play around with Ruby and it’s approach to mixin classes (like “acts_as_taggable”). It seemed clever at the time, but now it just seems stupid.

Anyway, these classes allowed objects to be a part of multiple lists at a time while eliminating the need for duplicate code, but there was one more issue…

As it turns out, the objects that use these classes the most also happen to be the most frequently created, short-lived objects in Soar, in particular rete WMEs and tokens. ListHead and AsListMember object were being created like crazy, reducing performance. So over the course of a few days, I slowly reverted the most performance-critical lists to raw linked list manipulation code, just duplicating the list maintenance. D’oh. It improved performance by around 40% though, so I can’t complain too much.

This seems like enough for now. In a later installment, I’ll talk about testing, getting Java to run as fast as C, and why refactoring the Soar kernel still seems hard even after its been moved to Java.

Links

Here are some random links about porting C/C++ to Java:

  • Jazillian – this guy’s put some serious thought into the issue. Plus, he occasionally stirs the pot on the ANTLR mailing list which is always fun.
  • Point Legacy C++ code Java: Tales from a Trading Desk – the answer is here. The point about functional tests is important. Lucky for me, some functional tests already existed for Soar and I had help from a Soar “guru”
Categories: c++, java, soar, software engineering Tags: , , ,

Google Collections API

November 24th, 2008 No comments

One aspect of the jsoar implementation that I wasn’t sure about in the beginning was the management of symbols. In the Soar kernel, a symbol represents an id, attribute or value in a working memory element. They boil down to either identifiers, or wrappers around atomic data types like string, double, or int.  The kernel caches symbols so that they can be compared quickly using pointer identity rather than the usual equals() and hashCode() approach.  These comparisons are the “inner loop” of the rete algorithm. The cache also helps with memory consumption where it’s not uncommon for a symbol, like input-link to appear in many rules.

In the C implementation of the Soar kernel, symbols are tediously reference counted. When the reference count goes to zero, the symbol is no longer in use and can be removed from the cache. In jsoar, I obviously didn’t want to do reference counting. We’re trying to make things easier here. But, if the symbols are cached, they’ll always be referenced and thus, never removed from the cache and garbage collected.

Before I go further, the symbol cache is basically a map from atomic value (like a string) to symbol, something like Map<String, StringSymbol> strings .

So, I started hunting around and quickly discovered Java’s support for weak references (I cheated a bit because I already knew they existed, just hadn’t used them before). A weak reference is a reference that does not stop the garbage collector from reclaiming an object. Nice.  Even better, Java includes WeakHashMap which sounds kind of like what I want, i.e. a map that will remove entries when a value is garbage collected. Unfortunately, WeakHashMap puts a weak reference on keys rather than values which is the opposite of what I want.

Next, I started looking into using a ReferenceQueue to implement what I wanted. This didn’t look that fun, but thankfully I stumbled upon the Google Collections API. It includes a Map implementation for all combinations of key/value reference types. Better yet, it even works.  Here’s what one of the map declarations looks like:

    private static <K, V> Map<K, V> newReferenceMap()
    {
        return new ReferenceMap<K, V>(ReferenceType.STRONG, ReferenceType.WEAK);
    }

    private Map<String, StringSymbolImpl> symConstants = newReferenceMap();

I used a static helper method to make the instantiation a little cleaner since I had to do it once for each type of symbol.

I mentioned that it even works. Here’s my unit test that I used to convince myself:

    @Test
    public void testGarbageCollectedSymbolsAreRemovedFromCache()
    {
        for(int i = 0; i < 1000; ++i)
        {
            assertNotNull(syms.createInteger(i));
            assertNotNull(syms.createString(Integer.toString(i)));
        }
        // Why do I believe this test works? Because it fails if I remove the
        // call to the garbage collector here :)
        System.gc();
        for(int i = 0; i < 1000; ++i)
        {
            assertNull(syms.findInteger(i));
            assertNull(syms.findString(Integer.toString(i)));
        }
    }

Anyway, that was a bunch of time saved. As a bonus, Google Collections includes a number of other useful functions and classes. All of the stuff you wish was in Java’s Collections class, but isn’t. Especially handy was the composite iterator implementation. You give it a list of iterators and it returns an iterator that will iterate over each in turn. This was great for creating a natural interface for some of Soar’s weirder iterable data structures, like working memory.

Categories: java Tags:

Why Java? … or … why not C++?

November 24th, 2008 No comments

In a recent meeting I was asked whether Java’s concurrency support was an advantage for jsoar over the existing C++ implementation. I kind of said yes, but was mostly incoherent because I’m always trying to avoid saying things that aren’t true.  This is probably a weakness. Anyway, I wrote the followup email below and I kind of liked it so I figured I’d post it … just for context, “kernel” here refers to the Soar kernel …

Regarding Java concurrency support vs. C/C++… In any discussion like this, you can argue than anything is possible in just about any language which is why I was a little hesitant to make a firm statement. In this case, C++ (or at least its runtime libraries) provide the same basic concurrency primitives as Java, i.e. locks, condition variables, etc. However, Java has two advantages here. First is cross-platform support. The Java concurrency and memory model are specifications that are implemented the same on every platform. So
while it’s possible to write a wrapper library that puts a common interface around say pthreads and the Win32 threading APIs, guaranteeing consistent semantics across both is notoriously difficult (e.g. http://www.cs.wustl.edu/~schmidt/win32-cv-1.html).

Boost and other libraries will cover many things, but you’re still left having to include the boost library and hoping that it plays nicely with whatever threading model any other library your using decides to use.

Furthermore, on top of the basic synchronization primitives, Java provides a really nice library of higher-level concurrency tools (http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/package-summary.html) like concurrent collections, thread-safe queues, thread pools, etc.  All of these can be implemented with pthreads, but it’s nice to have a standard set of tools that are used more or less uniformly throughout the community. The situation is similar to the mid-90s C++ where everybody and their grandma had their own implementation of a string class because a standard hadn’t yet been established.

More generally, since I’ve been thinking about this quite a bit lately for some reason, the advantages of Java over C/C++ boil down to the following:

  • Managed execution with garbage collection. I know Bob and Jon are dreading porting the new waterfall back to C where they’re going tohave to start reference counting symbols again
  • Far superior tools. Setting aside the complexity of C++ itself, as long as there’s a pre-processor, you’ll never have refactoring, or even browsing, tools that you can really trust. Not to mention accurate code completion, etc. This is where I personally see a 5-10x productivity gain over C++ and I consider myself a pretty knowledgeable C++ programmer.
  • More flexible run-time architecture. The fact that I can just add another main() to a code base for testing without creating another project is great, not to mention the ability to drop a library on the class path and load it reflectively. Of course, this can be done in C++ with DLLs, but with all of the requisite cross-platform baggage, not to mention issues with memory management when each DLL may have its own heap, compiler support for exceptions across DLL boundaries, binary compatibility issues, etc.
  • Superior testing tools. cppunit works, but again, because of the run-time stuff above, you write as much boilerplate as actual test code and integration with the development environment is limited.

The more I’ve thought about this, the cute stuff like scripting language support, running in Java application servers, and even concurrency falls away. They’re all nice, but also possible with the Soar’s SML Java (or C#, Python, etc) bindings (maybe with some tweaking).  So you’re left with the fact that development and testing on the kernel itself (and surrounding modules) is where Java really wins out.  If you want to do research on the kernel itself, it helps to not spend half of your time waiting for builds to finish, hunting through headers, or creating cross-platform wrappers for basic services.

Also, this isn’t really about Java. If jsoar was a completely personal project for me (i.e. I didn’t work at SoarTech), I’d probably try it in C#, which seems to be both mainstream (as opposed to other advanced
languages like F#, Scala, etc) and ahead of Java in cool language features. It’s more about productivity in C++ versus almost any other language out there.

Categories: c++, concurrency, java, soar Tags: ,

Pepys’ Diary

November 23rd, 2008 No comments

Lately, I’ve been subscribed to Samuel Pepys’ Diary. He “was an English naval administrator and Member of Parliament, who is now most famous for his diary”.  Because I’m not that smart, the language is a bit tough to read. The interesting this is the way it’s lining up with current events.  With all the recent turmoil, the collapse of my 401k, and the impending global depression, reading his daily posts about life in London during The Plague really puts things in perspective :)

As and extra bonus, Samuel Pepys appears in Neal Stephenson‘s Baroque Cycle, which chronicles the same time period. I wasn’t smart enough to finish that either, but I blame the rain. I left “King of the Vagabonds” in the rain, so lost a few weeks of reading time while I waited for it to dry out.  By the time I got back to it, I had no idea what was going on any more.

Categories: misc Tags: