Home > java, soar, software engineering > Porting Soar to Java or: How I Learned to Stop Worrying and Love Spaghetti (Part 2)

Porting Soar to Java or: How I Learned to Stop Worrying and Love Spaghetti (Part 2)

December 26th, 2008 Leave a comment Go to comments

In the previous installment of this series, I wrote about some of the challenges of the initial port of the Soar cognitive architecture from C/C++ to Java. As I noted then, the approach I chose was bottom-up with minimal refactoring. With a couple months of work, I converted about 40k lines of C++ code to about 40k lines of Java code.

Actually, the overhead of stronger typing, lack of macros and unions made the Java implementation generally a bit larger in terms of lines of code. I think the ability to reliably browse the code in Eclipse more than made up for the bloat.

Moving Spaghetti Around The Plate

The original Soar code base is an amalgam of different programming styles reflective of its history as a university research system. There are hints of object orientation as well as functional aspects (it was originally implemented in Lisp, of course), but for the most part it’s good old procedural code. Open data structures with various free functions performing operations on them. The code base itself is broken up into compilation units along mostly functional lines. There’s decide.cpp, which deals mostly with the decision process: substates, impasses, the goal dependency set, etc. There’s symtab.cpp which deals for the most part with allocating and wrangling Soar symbol structures. And on and on…

Of course, you need an object to kind of tie all these pieces together. In the case of Soar, there is the agent struct, aka One Struct To Rule Them All. The agent struct lives in agent.h of all places and is 639 lines of deliciously public members. Here’s a taste:

typedef struct agent_struct {
  /* After v8.6.1, all conditional compilations were removed
   * from struct definitions, including the agent struct below

  /* ----------------------- Rete stuff -------------------------- */
   * These are used for statistics in rete.cpp.  They were originally
   * global variables, but in the deglobalization effort, they were moved
   * to the (this) agent structure.
  unsigned long actual[256], if_no_merging[256], if_no_sharing[256];

  unsigned long current_retesave_amindex;
  unsigned long reteload_num_ams;
  alpha_mem **reteload_am_table;

  // ... #### 615 lines omitted for sake of brevity #### ...
  // JRV: Added to support XML management inside Soar
  // These handles should not be used directly, see xml.h
  xml_handle xml_destination;		// The current destination for all XML generation, essentially either == to xml_trace or xml_commands
  xml_handle xml_trace;				// During a run, xml_destination will be set to this pointer.
  xml_handle xml_commands;			// During commands, xml_destination will be set to this pointer.

} agent;
/*************** end of agent struct *****/

It’s a beast and it’s passed to just about every function in the system just in case that function may need access to just about anything.

In the interests of sanity, I took a fairly naive approach to the port. For each compilation unit (cpp file) I:

  • Created a Java class
  • Created a Java method for each function in the cpp file
  • Created Java member variables for each member of the old agent structure that seemed to be accessed more or less exclusively by that module

This approach gave me the warm and fuzzy feeling that I was breaking up that awful agent struct and make the system more modular. All my dreams of refactoring the spaghetti of the Soar kernel into a highly modular, easily extended and tested system were coming true…

Ok, maybe not. As I mentioned above, the kernel was only broken up across cpp files along functional lines. This meant that any member variable that I chose to move from the agent structure to the Java class corresponding to the cpp file still had to be public because it was likely that several other modules accessed it however they wanted.

I had taken a 10 Lbs wad of spaghetti and delicately teased it into 10 or so 1 Lbs wads. Each of these spaghetti-lets still maintained an array of strands connecting it to most of its siblings. I think a diagram is in order.

Here’s what I started with, 10 Lbs of spaghetti:

10 Lbs of Spaghetti

10 Lbs of Spaghetti

And here’s what I ended with, 10 little 1 Lbs spaghetti monster babies:

1 Lbs Spaghetti Babies, 10 of them

1 Lbs Spaghetti Babies, 10 of them

See what I mean? I’m really no closer to object orientation, encapsulation or anything. And, of course, the punchline is that I need a top-level object to stitch all these babies together. Can you guess what it’s called?

So, I have an Agent class. It contains a bunch of “module” objects which are all intertwined with each other and have to be public so that everyone can get at each other’s parts.  I’m pretty sure there’s a code smell here, but I can’t quite put my finger on it…

I have actually have two goals here. First, I want to build a public interface for jsoar that is clean and clear and suitable for integrating intelligent Soar agents into cool systems. Second, I want an agent that’s nicely modularized and encapsulated so that the rete can be used (and tested!) on its own, etc. Of course, I don’t want to over encapsulate either. Soar is first and foremost a research system which, in my opinion, means that encapsulation can often get in the way of getting things done.

For the first goal, a clean interface, I want the Agent class to be straightforward without a bunch of yucky public members or just as yucky public accessors.  I also want an interface that will allow me to refactor all these modules slowly over time without impacting external clients. Here I’ll describe my current approach to solving these two problems.

Using the Adapter Pattern to Hide Your Spaghetti

First, how to I give access to private members without cluttering up the interface with a bunch of getters?  For this problem, I chose to use the adapter pattern used liberally by the Eclipse framework. The basic idea is an interface like this:

public interface Adaptable
    Object getAdapter(Class<?> klass);

The getAdapter method takes a class as an argument and returns an instance of that class. Basically, you’re asking the adaptable object to turn itself into something else for you. In the case of the jsoar Agent, this is a great way to give access to internal modules without cluttering up the API. When one module needs access to another internal module, it can just ask for it by class name:

Decider decider = (Decider) agent.getAdapter(Decider.class);

Here Decider is an internal class. If you happen to know the password (Decider.class) you can get access to it. If you’re just a casual client building another demonstration of Missionaries and Cannibals, you’ll never be tempted by that public getDecider() method, because it’s not there. Yay!  This could also be implemented with a map and string keys, but I kind of like the adapter approach for its simplicity and type safety.

I realize I could also introduce an Agent interface where the private implementation has all the accessors and public members you could want. I will probably add such an interface as well, but I still like the approach of accessing this stuff only through the adapter. If also clearly illuminates the numerous dependencies between the internal modules in a way that I think getters would hide. It’s psychological :)

Hey, I was Eating That! Twiddling Your Secret Spaghetti

Now, there are a lot of places where an external client would like to twiddle the private parts of various internal modules. For example, to change the “wait on state-no-change” setting, client code really needs to be able to access Decider.waitsnc, which is a boolean member variable. Well, it seems like I just cut off that route in the previous section. Besides, I’m not really married to this whole Decider class thing anyway. It’s a monster and should probably be broken up into several smaller objects.  I could just add a getter/setter pair to the top-level Agent class.  There are dozens of these parameters though and I don’t want them cluttering up the interface.

My solution to this is a simple multi-layer property system. It provides type-safety as well as a affordances for high-performance parameters that are accessed frequently in inner loops. First we start off with a generic class that describes a single parameter/property, a PropertyKey. It’s basically like this:

class PropertyKey<T>
    public String getName();

    public T getDefaultValue();

    // ... etc ...

A PropertyKey is an immutable object. Instances are built with a convenient builder interface. They are meant to be instantiated as constants, i.e. static and final. A PropertyKey acts as a key into a map of property values managed by, of all things, a PropertyManager:

class PropertyManager
    public <T> T get(PropertyKey<T> key);
    public <T> T set(PropertyKey<T> key, T value);

    // ... etc ...

As you can see, this is all nice and typesafe. Now, what if we have a property that’s a flag, like “learning enabled” that’s checked frequently by internal code. In this case, for performance, we don’t want that inner loop constantly doing a map lookup, not to mention boxing and unboxing of the value. Enter the third interface, PropertyProvider:

public interface PropertyProvider<T>
    T get();
    T set(T value);

A property provider holds the actual value of the property rather than holding it directly in the property manager. Thus, in the Chunker module, our learning flag can be managed with a simple inner class:

public class Chunker
    // ...
    private boolean learningEnabled;
    private PropertyProvider<Boolean> learningEnabledProvider = new PropertyProvider<Boolean>() {
        public Boolean get() { return learningEnabled; }
        public void set(Boolean value)
            learningEnabled = value;

Now, high-frequency code can access the learningEnabled member directly (through the getAdapter() back door), while low-frequency client code can access it through the PropertyManager interface. As a bonus, the property provider can do additional bounds checking on parameters and other fancy stuff. Best of all, our Agent interface isn’t faced with an ever growing set of arbitrary accessors. New properties can be added as needed without affecting other code. In fact, they can be added at run-time, if that’s ever necessary.

Oh, there’s more

So. Now I’m at a point where I have a pretty clean public interface for building jsoar-based systems. Beneath this clean API lurks a bunch of baby spaghetti monsters just dying to be refactored. I haven’t quite firgured that part out yet and so, I’ll have to leave that story for another day.

Categories: java, soar, software engineering Tags: , ,
  1. No comments yet.
  1. No trackbacks yet.