This page contains a simple RL grid path-finding demo applet and
application implemented using the classes from my rl
package. The GUI is rather primitive, but I didn't want to invest
much time into the interface, because 1) the AWT appears to be the
least stable part of the Java standard library, 2) in its present form
it is still an Awkward Window Toolkit to some extent. I wrote it
during two or three weeks of the 1997 summer holidays to learn Java
(this is my first experience with the language) and to entertain
myself by programming just for fun. I haven't used this code for any
purpose except just playing with it a little (all my previous research
code was written in C++), but maybe I will. Of course I realize that
using Java for numerical computation is problematic, but the
"inefficiency" of the (current implementations of) the language is not
always an issue.
Both this demo and the underlying reinforcement learning code require Java 1.1 and won't run under Java 1.0.2. The applet has been tested only with HotJava and there are probably some problems with Netscape. If the applet doesn't work, you can download the code and run locally as an application.
OK, it is mostly self-explanatory, so just a very short user's guide. The learner (red) is required to find a shortest path to a goal cell (green). It cannot move into cells occupied by obstacles (black). A trial is a sequence of steps that begins in some fixed or random (see below) initial and ends when a goal cell is reached. The learner receives a reward of -1 at each step except for the final one, when the reinforcement is 0.
The "library" demonstrated by this demo consists of two main packages,
rl and rl.gui, and a small package called
util with some auxiliary stuff that I found useful, but
missing in the standard Java library. (One day I will probably add
some prefix to these package names, such as
PL.something.) The reinforcement learning code is quite
simple, but flexible. It does not apply any "formal" design patterns
(not explicitly, at least), but it is hopefully object-oriented to
some extent. It does not follow the "standard" RL interface published
on Rich Sutton's home page, but could be easily adapted to conform to
it. The sources can be downloaded and freely
used.
The demo uses most of the functionality available in the
rl and rl.gui packages. The grid environment
is used mainly because it is easy to simulate and visualize in a nice
way. However, to implement a similar demo for another task, say
cart-pole balancing, you only need to define and "plug in"
two classes: CartPole for the simulation of the task and
CartPoleCanvas for its visualization. Everything else
(including all the GUI stuff) should work with absolutely no change.
The learning algorithm used in this demo is Sarsa combined with TTD to implement lambda>0. These two are described, respectively, in:
A Boltzmann-distribution action selection mechanism is used. The
"library" provides also Q-learning and it is straightforward
to implement other well-known algorithms (say, AHC). You can also
implement some other temporal credit assignment mechanism instead of
TTD without changing the implementation of these algorithms. But I am
going too far into "technical" details, which are covered by the
javadoc-generated documentation of the code.
Now, how can you play with the demo? If you don't like the layout of obstacles and goal cells, you can modify it using the left and right mouse buttons, respectively. This can be done at any time, either before or in the course of learning. You start the simulation by pressing the Start button. This clears the learner's knowledge, in this case by resetting its Q-values to 0. At any time you can suspend or resume the simulation using the Suspend and Resume buttons, respectively. When you have enough, or want to be able to start from the beginning, press Stop. Whenever the simulation is suspended or stop, the two text fields labeled "Trail" and "Step" display, respectively, the number of the current trial and the number of the last step made in the current trial. They are not updated after each step, since (in my Java development environment, at least) it turned out to be extremely memory-consuming. If you find the learner moving too fast, you can introduce a delay between steps by the use of the scroll bar labeled "Delay". By default the only delay in the simulation is that required to re-draw (an appropriate part of) the grid drawing.
Both the learner and the grid environment have some modifiable properties, or parameters. These are, for the learner:
You can change any of these parameters whenever you like by typing a new value in the corresponding text field and pressing Return (trying to set an illegal value will have no effect). There is no enforced upper limit on the grid size, but you will probably want not to exceed 50x50, as this is the size of the look-up table used for function representation by the demo (sure, it could be changed dynamically, but it's fixed to keep things simple). You may wonder why the parameters are displayed in some strange order, which is neither logical or alphabetical -- this is because 1) they are determined at run time and the class that displays them has no idea of what they mean, 2) regrettably the standard Java library has no sort routine, and 3) I didn't want to use some third-party sort (e.g., from the JGL) or to implement my own just to sort the properties alphabetically.
OK, that's probably all. Have fun. You can download the binaries as classes.jar and sources as either src.tar.gz or src.zip. The HTML documentation is available as doc.tar.gz or doc.zip. You can also browse it online. Questions, comments, or criticism welcome.