Archive

Archive for the ‘soar’ Category

Scripting JSoar

August 30th, 2010 No comments

In simpler times (say 2001) Soar was just Tcl. That is to say, Soar was a module, dynamically loaded into a Tcl interpreter at run-time. When loaded, Soar added a bunch of useful commands to the interpreter. Like run, matches, preferences, and probably most importantly, the sp command. When you “sourced” a Soar file, the Tcl interpreter just executed commands, loading rules, setting watch levels, etc.

The main drawback to this whole situation was that Tcl didn’t always lend itself to friendly embedding in other programs. It had funny rules about threads and, if Tk was involved, demanded to have its message queue pumped. And, of course, very few people get to know Tcl enough to like it :)

On the other hand, you could create macros for repetitive Soar structures, define new RHS functions, manipulate I/O. In short, you had the power of a full programming language mixed in with your Soar code.

With Soar 8.6 Soar’s tight integration with Tcl was broken, replaced by SML and a stricter command interpreter. It still looked like Tcl commands, but there were no Tcl control structures, variables, etc. Way it goes. When I initially started work on JSoar, I needed to quickly bootstrap a command interpreter so I could load existing code into the kernel. I turned to Tcl, in the form of Jacl, a Java implementation of Tcl. It saved a lot of time and, since Soar’s syntax was still basically Tcl, no one would really notice.

Of course, as I mentioned, no one wants Tcl, so over the last couple weeks, I’ve added a new scripting layer to JSoar. This time, I’m taking advantage of the Java Scripting API, JSR-223. This allows any scripting language with a JSR-223 implementation to be pretty seamlessly accessed from Java (and vice versa). With this new capability, it’s now possible to automate Soar agents, implement simple agent environments, and extend SoarUnit testing to include I/O, all from within a Soar source file. All with a variety of languages including Ruby, JavaScript, Python, Clojure, Groovy, etc.

A scripting engine (a language implementation) is invoked with the script command:

script javascript {
   soar.onInput(function(e) {
      soar.wmes.add("my-input", "hello");
   });
   soar.onOutputCommand("say", function(e) {
      soar.print("The agent says: " + e.greeting);
   });
}

This little bit of code sets up an input phase callback and creates a WME on the agen’ts input-link. It also handles an output command called “say”. The equivalent Java code would be … more ceremonious. Not to mention setting up a new project, compiling, etc, etc is a major hassle.

As an example, I’ve implemented a simple waterjugs environment in JavaScript, Ruby, and Python. Here are some things you can do:

  • Generate input
  • Handle output commands
  • Auto-convert hashes (JavaScript objects, Python dicts, or Ruby hashes) to input structures
  • Install new RHS functions
  • Add new commands
  • and on and on

Also, with maybe a little more work, I might have a pretty good story for dealing with I/O in SoarUnit tests. Stay tuned.

More detailed info on JSoar scripting support can be found on the JSoar wiki.

Introducing SoarUnit

August 1st, 2010 No comments

The history of testing in Soar is short and not very happy. This is especially true for automated testing, where the tools have generally been ad hoc, proprietary, and hard to use. I am personally responsible for some of these tools. Despite all this, I’m foolishly taking another foray into the land of Soar unit testing. This new effort is called SoarUnit and is part of the JSoar suite. However, since the JSoar community is even tinier than the overall Soar community, I’ve made sure that SoarUnit is compatible with both Soar implementations.

(For more complete docs and examples, see the SoarUnit wiki page, and the Bebot source code. There’s a snapshot (20100801) of JSoar with the latest SoarUnit on the JSoar downloads page.)

SoarUnit UI

The one advantage that this effort may have over previous Soar testing attempts is that I developed it at the same time that I was actively working on Soar code. When I first started working on Soar99 and Bebot, my testing strategy was very ad hoc. I had a big Soar file that looks something like this:

   source "test1.soar"
   run
   excise --all

   source "test2.soar"
   run
   excise --all

   ... and so on ...

To run my tests, I’d just source the file and eyeball the trace for errors, which, if you’re familiar with Soar, you’ll know are usually pretty easy to spot. Of course, there were some drawbacks to this approach:

  • Manually checking for success/failure is painful. The “green bar” is so much nicer.
  • To add a single new test, I had to add another three lines to  my “master test” file.
  • There was no way to put more than one test in a file, necessitating either a bunch of duplicate code, or some confusing trickery with the source command.
  • When a test failed, I had to manually source it to debug.

Basically, all the same problems you get when manually testing code in any other language.

So, with 30 or so ad hoc tests already built, I started working on SoarUnit. Here’s the basic structure of a testcase file:

setup {
   ... Code run before every test. Here you can source code under test,
       setup test data, propose initialization opreators, etc ...
}

test "name of test" {
   ... Code for an individual test. Here you put any code
       you need for the test. When the test's success condition's
       have been detected, a rule should call the (pass) RHS function.
       Similarly, there's a (fail) RHS function for detecting failures ...
}

... More tests ...

Nice and simple.

I think the existing tests were key to how well development has gone. Every previous attempt I’ve made for testing Soar code has been based on imagining how someone might use such a framework, without any real-world requirements to build from. Although I’ve only been working on it a little more than a week, SoarUnit already has:

  • Test case discovery by file name pattern
  • Multiple tests per file
  • Setup blocks to handle code shared by multiple tests
  • A graphical interface similar to Eclipse’s JUnit view (see below)
  • Single-click debugging from the user interface
  • Support for both JSoar and CSoar 9.3.0 (CSoar requires that a SOAR_HOME environment variable be set so it can find native libraries and stuff)
  • Basic code coverage reporting

Best of all, I’ve used it a lot on Bebot and Soar99 and it’s honestly really nice. I did a major refactoring, basically renaming all of the major data structures and operators in the library, and it was a breeze. I guess that’s the point of having tests…

What’s Next?

I’m going to keep working on SoarUnit to support my own Soar development. There are still some obvious holes and open questions.

One area that’s always plagued talk of testing Soar code is I/O. How do you test an agent independent of the environment it will run in. I’m currently of the mind that, like unit testing in every other language, this is where mocking comes in. With a few rules, an agent can easily simulate static structures on the input-link to driver tests. If things get too complicated, one option is simple integration with soar2soar, where for each test, a helper agent would be created to simulate the environment. There are other options as well (plugins, external processes, etc), but none of them maintains the simplicity I want for SoarUnit. For every configuration parameter, you lose a user and with Soar there aren’t that many to start with.

The other open question is whether SoarUnit is effective for testing idiomatic Soar code. Soar has very little encapsulation or modularity which can make it difficult to isolate code for testing. The problems I’m solving with Bebot are very procedural, so they’re easy to test, but I’m not sure that’s true for most Soar code. I’d like to work through creating tests for some of the Soar tutorial problems and see how it goes.

Categories: soar Tags: , , ,

Porting Soar to Java or: How I Learned to Stop Worrying and Love Spaghetti (Part 2)

December 26th, 2008 No comments

In the previous installment of this series, I wrote about some of the challenges of the initial port of the Soar cognitive architecture from C/C++ to Java. As I noted then, the approach I chose was bottom-up with minimal refactoring. With a couple months of work, I converted about 40k lines of C++ code to about 40k lines of Java code.

Actually, the overhead of stronger typing, lack of macros and unions made the Java implementation generally a bit larger in terms of lines of code. I think the ability to reliably browse the code in Eclipse more than made up for the bloat.

Moving Spaghetti Around The Plate

The original Soar code base is an amalgam of different programming styles reflective of its history as a university research system. There are hints of object orientation as well as functional aspects (it was originally implemented in Lisp, of course), but for the most part it’s good old procedural code. Open data structures with various free functions performing operations on them. The code base itself is broken up into compilation units along mostly functional lines. There’s decide.cpp, which deals mostly with the decision process: substates, impasses, the goal dependency set, etc. There’s symtab.cpp which deals for the most part with allocating and wrangling Soar symbol structures. And on and on…

Of course, you need an object to kind of tie all these pieces together. In the case of Soar, there is the agent struct, aka One Struct To Rule Them All. The agent struct lives in agent.h of all places and is 639 lines of deliciously public members. Here’s a taste:

typedef struct agent_struct {
  /* After v8.6.1, all conditional compilations were removed
   * from struct definitions, including the agent struct below
   */

  /* ----------------------- Rete stuff -------------------------- */
  /*
   * These are used for statistics in rete.cpp.  They were originally
   * global variables, but in the deglobalization effort, they were moved
   * to the (this) agent structure.
   */
  unsigned long actual[256], if_no_merging[256], if_no_sharing[256];

  unsigned long current_retesave_amindex;
  unsigned long reteload_num_ams;
  alpha_mem **reteload_am_table;

  // ... #### 615 lines omitted for sake of brevity #### ...
  // JRV: Added to support XML management inside Soar
  // These handles should not be used directly, see xml.h
  xml_handle xml_destination;		// The current destination for all XML generation, essentially either == to xml_trace or xml_commands
  xml_handle xml_trace;				// During a run, xml_destination will be set to this pointer.
  xml_handle xml_commands;			// During commands, xml_destination will be set to this pointer.

} agent;
/*************** end of agent struct *****/

It’s a beast and it’s passed to just about every function in the system just in case that function may need access to just about anything.

In the interests of sanity, I took a fairly naive approach to the port. For each compilation unit (cpp file) I:

  • Created a Java class
  • Created a Java method for each function in the cpp file
  • Created Java member variables for each member of the old agent structure that seemed to be accessed more or less exclusively by that module

This approach gave me the warm and fuzzy feeling that I was breaking up that awful agent struct and make the system more modular. All my dreams of refactoring the spaghetti of the Soar kernel into a highly modular, easily extended and tested system were coming true…

Ok, maybe not. As I mentioned above, the kernel was only broken up across cpp files along functional lines. This meant that any member variable that I chose to move from the agent structure to the Java class corresponding to the cpp file still had to be public because it was likely that several other modules accessed it however they wanted.

I had taken a 10 Lbs wad of spaghetti and delicately teased it into 10 or so 1 Lbs wads. Each of these spaghetti-lets still maintained an array of strands connecting it to most of its siblings. I think a diagram is in order.

Here’s what I started with, 10 Lbs of spaghetti:

10 Lbs of Spaghetti

10 Lbs of Spaghetti

And here’s what I ended with, 10 little 1 Lbs spaghetti monster babies:

1 Lbs Spaghetti Babies, 10 of them

1 Lbs Spaghetti Babies, 10 of them

See what I mean? I’m really no closer to object orientation, encapsulation or anything. And, of course, the punchline is that I need a top-level object to stitch all these babies together. Can you guess what it’s called?

So, I have an Agent class. It contains a bunch of “module” objects which are all intertwined with each other and have to be public so that everyone can get at each other’s parts.  I’m pretty sure there’s a code smell here, but I can’t quite put my finger on it…

I have actually have two goals here. First, I want to build a public interface for jsoar that is clean and clear and suitable for integrating intelligent Soar agents into cool systems. Second, I want an agent that’s nicely modularized and encapsulated so that the rete can be used (and tested!) on its own, etc. Of course, I don’t want to over encapsulate either. Soar is first and foremost a research system which, in my opinion, means that encapsulation can often get in the way of getting things done.

For the first goal, a clean interface, I want the Agent class to be straightforward without a bunch of yucky public members or just as yucky public accessors.  I also want an interface that will allow me to refactor all these modules slowly over time without impacting external clients. Here I’ll describe my current approach to solving these two problems.

Using the Adapter Pattern to Hide Your Spaghetti

First, how to I give access to private members without cluttering up the interface with a bunch of getters?  For this problem, I chose to use the adapter pattern used liberally by the Eclipse framework. The basic idea is an interface like this:

public interface Adaptable
{
    Object getAdapter(Class<?> klass);
}

The getAdapter method takes a class as an argument and returns an instance of that class. Basically, you’re asking the adaptable object to turn itself into something else for you. In the case of the jsoar Agent, this is a great way to give access to internal modules without cluttering up the API. When one module needs access to another internal module, it can just ask for it by class name:

Decider decider = (Decider) agent.getAdapter(Decider.class);

Here Decider is an internal class. If you happen to know the password (Decider.class) you can get access to it. If you’re just a casual client building another demonstration of Missionaries and Cannibals, you’ll never be tempted by that public getDecider() method, because it’s not there. Yay!  This could also be implemented with a map and string keys, but I kind of like the adapter approach for its simplicity and type safety.

I realize I could also introduce an Agent interface where the private implementation has all the accessors and public members you could want. I will probably add such an interface as well, but I still like the approach of accessing this stuff only through the adapter. If also clearly illuminates the numerous dependencies between the internal modules in a way that I think getters would hide. It’s psychological :)

Hey, I was Eating That! Twiddling Your Secret Spaghetti

Now, there are a lot of places where an external client would like to twiddle the private parts of various internal modules. For example, to change the “wait on state-no-change” setting, client code really needs to be able to access Decider.waitsnc, which is a boolean member variable. Well, it seems like I just cut off that route in the previous section. Besides, I’m not really married to this whole Decider class thing anyway. It’s a monster and should probably be broken up into several smaller objects.  I could just add a getter/setter pair to the top-level Agent class.  There are dozens of these parameters though and I don’t want them cluttering up the interface.

My solution to this is a simple multi-layer property system. It provides type-safety as well as a affordances for high-performance parameters that are accessed frequently in inner loops. First we start off with a generic class that describes a single parameter/property, a PropertyKey. It’s basically like this:

class PropertyKey<T>
{
    public String getName();

    public T getDefaultValue();

    // ... etc ...
}

A PropertyKey is an immutable object. Instances are built with a convenient builder interface. They are meant to be instantiated as constants, i.e. static and final. A PropertyKey acts as a key into a map of property values managed by, of all things, a PropertyManager:

class PropertyManager
{
    public <T> T get(PropertyKey<T> key);
    public <T> T set(PropertyKey<T> key, T value);

    // ... etc ...
}

As you can see, this is all nice and typesafe. Now, what if we have a property that’s a flag, like “learning enabled” that’s checked frequently by internal code. In this case, for performance, we don’t want that inner loop constantly doing a map lookup, not to mention boxing and unboxing of the value. Enter the third interface, PropertyProvider:

public interface PropertyProvider<T>
{
    T get();
    T set(T value);
}

A property provider holds the actual value of the property rather than holding it directly in the property manager. Thus, in the Chunker module, our learning flag can be managed with a simple inner class:

public class Chunker
{
    // ...
    private boolean learningEnabled;
    private PropertyProvider<Boolean> learningEnabledProvider = new PropertyProvider<Boolean>() {
        public Boolean get() { return learningEnabled; }
        public void set(Boolean value)
        {
            learningEnabled = value;
        }
    };

Now, high-frequency code can access the learningEnabled member directly (through the getAdapter() back door), while low-frequency client code can access it through the PropertyManager interface. As a bonus, the property provider can do additional bounds checking on parameters and other fancy stuff. Best of all, our Agent interface isn’t faced with an ever growing set of arbitrary accessors. New properties can be added as needed without affecting other code. In fact, they can be added at run-time, if that’s ever necessary.

Oh, there’s more

So. Now I’m at a point where I have a pretty clean public interface for building jsoar-based systems. Beneath this clean API lurks a bunch of baby spaghetti monsters just dying to be refactored. I haven’t quite firgured that part out yet and so, I’ll have to leave that story for another day.

Categories: java, soar, software engineering Tags: , ,

My First Open Source Release

December 12th, 2008 No comments

This week, I released the first version of jsoar, my Java implementation of the Soar kernel.  Obviously, the Soar community is quite small, so the release is fairly low-key and low-stress.  Still, it’s the first time I’ve really done a release of an open source project of my own.  For promotion, I sent out an announcement to the mail Soar mailing list.  I also gave a brief introduction and demonstration during lunch at my job, a Soar shop.  Overall, I think it was well received. Everyone seemed engaged and maybe even excited to give it a try.  I even got an unexpected, but very nice, pat on the back.

Probably the most interesting thing to come out of the release though was my choice of version number.  This may have been foolish, but I released jsoar as version 0.0.1.  One of my problems is a fear of overstating or exaggerating something, so I think my reasoning for this decision has something to do with managing expectations.  I don’t want someone to download it and be disappointed because the version number made it seem like more than it was.

That said, I was immediately chastised by some colleagues for choosing such a low version number. In their opinion, 0.0.1 says “I’ve barely finished writing the first module and you’re lucky if the code even compiles”.  I guess they have a point. Now that I think about it, I would think the same thing if I came across an open source project with a single 0.0.1 release.  jsoar is actually fully functional and ready to be used in real Soar projects, at least projects tolerant of a little risk.  Because it’s a direct port, a lot of the code ain’t pretty, but by the same token, it benefits from 20 odd years of debugging and optimization.

As Steve Yegge has pointed out, marketing is actually a pretty important skill for developers. So, I’m learning that lesson again.  Maybe in a couple weeks I’ll put out a new version and call it Soar 10.0, or jsoar 2009. Ok, maybe not that, but I think 0.6.0 is probably a good compromise. I think that says “this system is functional, but don’t be surprised if I change a bunch of stuff on you before the next release”, which is really what I was going for in the first place.

A brief postscript: My release timing seems fortuitous. The next day, another message was posted to soar-group asking if anyone had successfully compiled Soar in 64-bit mode. Sadly, the answer is no owing to the C implementation’s frequent abuse of pointers and other architecture dependent features. jsoar, of course, has none of those problems and 64-bit support was one of my initial selling points of a Java implementation…

Porting Soar to Java or: How I Learned to Stop Worrying and Love Spaghetti (Part 1)

November 25th, 2008 No comments

I have recently just finished up a port of the Soar kernel from C/C++ to Java. The existing implementation is about 40,000 lines of C and C++ code accumulated over more than 15 years of development, mostly by grad students at the University of Michigan.   For such a long span of time, that’s not really that much code, but unlike a lot of “modern” object-oriented systems, every line does something important. There’s very little getter/setter boilerplate.

I won’t get into why I wanted to port it to Java here. That’s another story. Instead focus on the software process.

From the start, I had some basic principles to follow. First, this was to be a port, not a reimplementation from scratch. Although there is a manual, wiki, and forthcoming book on the Soar architecture, there is nothing that I would call a specification for the language or run-time. That is to say, as is so often the case, the code is the spec. And there’s enough code of a sufficiently intricate nature that there’s minimal hope of generating a spec for Soar in any reasonable amount of time. It’s intractable. Besides, I wanted to use jsoar in the near future and with two children under two, I’m not spending my spare time writing a spec.

Second, because there is already a community of researchers working on the kernel, I wanted the port to be as lossless as possible. If the entire structure of the kernel changes out from under them, the chances of adoption are low.

Third, I wanted to actually test the Soar kernel. Historically, testing on the kernel has been minimal aside from manually running a few test cases to make sure nothing crashes. Since no one wants to write unit tests for a big legacy system, I might as well write some while I’m porting.

Finally, I wanted to be fairly conservative with refactoring during the port. In taking a bottom up approach to the port, I never knew when a  refactoring choice would come back to haunt me.  Furthermore, one argument for Java is that it’s easy (or easier) to refactor, so I might as well refactor at the end when I have a full view of the system. I’ll post more on how this has turned out later.

So, from these principles, I started out from Rivendell for Mordor. The upshot of the first tenet is kind of the funniest. My approach to the port came down to the following procedure:

  1. Open a cpp file, say rete.cpp
  2. Gasp at the majesty of a 9000 line cpp file
  3. Create class Rete in Java
  4. One by one, paste functions from C++ into Java, changing pointer arrows to dots, ec, etc

Union Busting

That last step was a doozy. The first few days were essentially building up data structures in Java and required the most actual though. The Soar kernel makes heavy use of the old tagged union pattern. This generally maps pretty nicely to a type hierarchy with polymorphism, but there were a few stumbling blocks… In particular, there are a few Soar data structures that are “transmogrified” at run-time, i.e. their type is changed. Nice. In these cases, I settled for an “expanded union” where all of the unused data fields were null. This is the closest I could get in Java without some major refactoring of the code… and since I was taking a mostly bottom up approach to the port, I was never sure when a refactoring decision would bite me at some later point.

Here’s an example of such a beast. You can see my assertions desperately trying to ensure that the “union” was indeed behaving like a union.

Function Pointers

The rete algorithm implementation in Soar makes heavy use of tables of function pointers. Since Java lacks function pointers, I took the fairly tedious route of defining an interface and implementing it once for each function pointer.  Basically, I ended up with a bunch of inner classes, one for each function pointer. This worked fine.

The interesting part came when the port was finished and I started performance testing. It turns out that method calls from a nested class to a containing class are actually pretty slow when called in an inner loop due to an additional level of method calls generated by the compiler. So, in the end, for many function pointer tables, I ended up reverting to simple switch statements which gave me something like a 20% speedup on some tests.

Enumerations

Enumerations are one area where I really did some refactoring. I kept the names the same, but I almost always converted enum or #define constant lists to Java enumerations. The ordinal() method provides access to the original integer value, if needed, but I found that with EnumSet and EnumMap, this was rarely necessary. The improved typesafety and overall debuggability didn’t hurt either.

Macros

Another obvious area of difficulty is what to do with macros, especially “clever” macros. For example, here’s the insert_at_head_of_dll() macro from the C Soar kernel:

/* This macro cannot be easily converted to an inline function.
   Some additional changes are required.
*/
#define insert_at_head_of_dll(header,item,next_field_name,prev_field_name) { \
  ((item)->next_field_name) = (header) ; \
  ((item)->prev_field_name) = NIL ; \
  if (header) ((header)->prev_field_name) = (item) ; \
  (header) = (item) ; }

That’s fun, isn’t it?  I kept the comment (ca. 2003) for extra effect. This is a “generic” procedure for inserting an item at the head of a linked list.  You’ve got the list header, the item and then two special parameters. What do they do? They name the next and previous fields in the item structure used for the linked list. Why not fix the names so you can just define a C++ template? Or pass in the address of the next and previous links? There are probably a few reasons. One would be performance. When this code was written (early 90s) they were squeezing every bit of performance they could out of the code. More interesting though is why the names aren’t fixed. That’s easily answered with a look at the working memory element (WME) structure from the kernel:

typedef struct wme_struct {
  /* WARNING:  The next three fields (id,attr,value) MUST be consecutive--
     the rete code relies on this! */
  Symbol *id;
  Symbol *attr;
  Symbol *value;
  Bool acceptable;
  unsigned long timetag;
  unsigned long reference_count;
  struct wme_struct *rete_next, *rete_prev; /* used for dll of wmes in rete */
  struct right_mem_struct *right_mems;      /* used for dll of rm's it's in */
  struct token_struct *tokens;              /* dll of tokens in rete */
  struct wme_struct *next, *prev;           /* (see above) */
  struct preference_struct *preference;     /* pref. supporting it, or NIL */
  struct output_link_struct *output_link;   /* for top-state output commands */
  tc_number grounds_tc;                     /* for chunker use only */
  tc_number potentials_tc, locals_tc;
  struct preference_struct *chunker_bt_pref;

  /* REW: begin 09.15.96 */
  struct gds_struct *gds;
  struct wme_struct *gds_next, *gds_prev; /* used for dll of wmes in gds */
  /* REW: end   09.15.96 */

} wme;

Just this one datastructure is potentially part of 1, 2, 3 linked lists (see the X_next and X_prev fields?) and holds the head for 1, 2, 3, 4 linked lists. So, how can this be improved? If I can’t improve it in C, I doubt I can implement it in Java. Well, the next and previous pointers can at least be pushed down into a structure:

template <typename T> struct list_member
{
   T next;
   T prev;
};

which will allow us to do something like this:

typedef struct wme_struct {
  // ... snip ...
  list_member<struct wme_struct*> rete_next_prev; /* used for dll of wmes in rete */
  struct right_mem_struct *right_mems;      /* used for dll of rm's it's in */
  struct token_struct *tokens;              /* dll of tokens in rete */
  list_member<struct wme_struct> next_prev;           /* (see above) */

  // ... snip ...

  struct wme_struct *gds_next, *gds_prev; /* used for dll of wmes in gds */
} wme;

Now we have actual object we can work with and write a single insert_at_head_of_dll() function, not macro. And it will be typesafe… but, we’re porting to Java here, and my number one goal is to port this thing, and my number two goal is to avoid manually expanding that linked list management code over and over again in Java.

My solution is encapsulated in two generic classes, ListHead and AsListItem. The former represents the head of a list, the latter an entry in the linked list equivalent to list_member<T> above. AsListItem came about when I was starting to play around with Ruby and it’s approach to mixin classes (like “acts_as_taggable”). It seemed clever at the time, but now it just seems stupid.

Anyway, these classes allowed objects to be a part of multiple lists at a time while eliminating the need for duplicate code, but there was one more issue…

As it turns out, the objects that use these classes the most also happen to be the most frequently created, short-lived objects in Soar, in particular rete WMEs and tokens. ListHead and AsListMember object were being created like crazy, reducing performance. So over the course of a few days, I slowly reverted the most performance-critical lists to raw linked list manipulation code, just duplicating the list maintenance. D’oh. It improved performance by around 40% though, so I can’t complain too much.

This seems like enough for now. In a later installment, I’ll talk about testing, getting Java to run as fast as C, and why refactoring the Soar kernel still seems hard even after its been moved to Java.

Links

Here are some random links about porting C/C++ to Java:

  • Jazillian – this guy’s put some serious thought into the issue. Plus, he occasionally stirs the pot on the ANTLR mailing list which is always fun.
  • Point Legacy C++ code Java: Tales from a Trading Desk – the answer is here. The point about functional tests is important. Lucky for me, some functional tests already existed for Soar and I had help from a Soar “guru”
Categories: c++, java, soar, software engineering Tags: , , ,

Why Java? … or … why not C++?

November 24th, 2008 No comments

In a recent meeting I was asked whether Java’s concurrency support was an advantage for jsoar over the existing C++ implementation. I kind of said yes, but was mostly incoherent because I’m always trying to avoid saying things that aren’t true.  This is probably a weakness. Anyway, I wrote the followup email below and I kind of liked it so I figured I’d post it … just for context, “kernel” here refers to the Soar kernel …

Regarding Java concurrency support vs. C/C++… In any discussion like this, you can argue than anything is possible in just about any language which is why I was a little hesitant to make a firm statement. In this case, C++ (or at least its runtime libraries) provide the same basic concurrency primitives as Java, i.e. locks, condition variables, etc. However, Java has two advantages here. First is cross-platform support. The Java concurrency and memory model are specifications that are implemented the same on every platform. So
while it’s possible to write a wrapper library that puts a common interface around say pthreads and the Win32 threading APIs, guaranteeing consistent semantics across both is notoriously difficult (e.g. http://www.cs.wustl.edu/~schmidt/win32-cv-1.html).

Boost and other libraries will cover many things, but you’re still left having to include the boost library and hoping that it plays nicely with whatever threading model any other library your using decides to use.

Furthermore, on top of the basic synchronization primitives, Java provides a really nice library of higher-level concurrency tools (http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/package-summary.html) like concurrent collections, thread-safe queues, thread pools, etc.  All of these can be implemented with pthreads, but it’s nice to have a standard set of tools that are used more or less uniformly throughout the community. The situation is similar to the mid-90s C++ where everybody and their grandma had their own implementation of a string class because a standard hadn’t yet been established.

More generally, since I’ve been thinking about this quite a bit lately for some reason, the advantages of Java over C/C++ boil down to the following:

  • Managed execution with garbage collection. I know Bob and Jon are dreading porting the new waterfall back to C where they’re going tohave to start reference counting symbols again
  • Far superior tools. Setting aside the complexity of C++ itself, as long as there’s a pre-processor, you’ll never have refactoring, or even browsing, tools that you can really trust. Not to mention accurate code completion, etc. This is where I personally see a 5-10x productivity gain over C++ and I consider myself a pretty knowledgeable C++ programmer.
  • More flexible run-time architecture. The fact that I can just add another main() to a code base for testing without creating another project is great, not to mention the ability to drop a library on the class path and load it reflectively. Of course, this can be done in C++ with DLLs, but with all of the requisite cross-platform baggage, not to mention issues with memory management when each DLL may have its own heap, compiler support for exceptions across DLL boundaries, binary compatibility issues, etc.
  • Superior testing tools. cppunit works, but again, because of the run-time stuff above, you write as much boilerplate as actual test code and integration with the development environment is limited.

The more I’ve thought about this, the cute stuff like scripting language support, running in Java application servers, and even concurrency falls away. They’re all nice, but also possible with the Soar’s SML Java (or C#, Python, etc) bindings (maybe with some tweaking).  So you’re left with the fact that development and testing on the kernel itself (and surrounding modules) is where Java really wins out.  If you want to do research on the kernel itself, it helps to not spend half of your time waiting for builds to finish, hunting through headers, or creating cross-platform wrappers for basic services.

Also, this isn’t really about Java. If jsoar was a completely personal project for me (i.e. I didn’t work at SoarTech), I’d probably try it in C#, which seems to be both mainstream (as opposed to other advanced
languages like F#, Scala, etc) and ahead of Java in cool language features. It’s more about productivity in C++ versus almost any other language out there.

Categories: c++, concurrency, java, soar Tags: ,