Vagabond Godsflaw (BETA)
 

Vagabond Godsflaw is a generic MUD robot for use in online RPG environments. The purpose of the Vagabond Godsflaw project is an academic study into the problem space of reinforcement learning, and its application to hostile environments. Vagabond Godsflaw is a patch to the TinyFugue MUD client Version 5.0 beta 6, released 2004-08-04. The basic idea behind the project is to implement an agent that is capable of navigating a hostile environment, and identifying the more optimal areas of reinforcement learning to achieve this task. When finished, Vagabond Godsflaw should be capable of healing itself (health, mana, etc.); exploring its environment with the ability to navigate to a specific location and gravitate towards rewarding areas, while avoiding the more dangerous ones; restock its inventory as needed; and battle both offensively and defensively in the environment. Ideally the only human intervention needed should be to configure the robot's environmental actuators and sensors.

 

The Preliminary Steps of a Case Study in Reinforcement Learning and its Application to Hostile Environments
 

For centuries vagabonds have lived what some would consider wrote and onerous lives in an attempt to maintain their autonomy. For instance, a vagabond in mid 17th century Europe would have been wise to move undetected so as to avoid the perils of compulsory military service, being put to work for idleness, or becoming a public example of merciless state power intended to coerce and control populations. Even basic existence, like the need for food and medicine, was often dwarfed by the dangers of falling victim to the plague or the will of murderers. This environment.s often violent and hostile nature may prove as an interesting case study for the area of artificial intelligence known as reinforcement learning. That is, if one were able to construct and study a vagabond agent using work done in the area of reinforcement learning, then it may shed light on the usefulness of certain utility functions and their applications to the real world. As defined an agent.s purpose is to act. And for the purpose of this case study, as well as the importance of mimicking a complex world, the agent needed here must do its acting within a robust environment capable of providing somewhat unpredictable, yet avoidable, dangers. Clearly, the development of such an environment is a huge task in its own rite, never mind adding enough of a human element so as to make the study.s application to the real world useful. One must identify a working environment model for the agent, how the agent will interact with that environment, and what areas of reinforcement learning look most promising to provide for the agents healthy existence.

As noted above, developing an environment with properties that are close enough to those of the real world is a daunting task. Luckily, with the nature of the Internet, such environments have sprung up in the form of massively multiplayer online RPGs (Role Playing Games). For a brief overview, online RPGs contain both human and non-human actors thrown together in a fantasy world that approximates, and perhaps sometimes exceeds, the volatility of the real world. These RPGs seem to meet all the prerequisites for a robust environment where one might test reinforcement learning in a vagabond agent. RPGs, while being an obvious choice for applications of artificial intelligence, have even managed to attract researchers studying more .real world. topics like economics and sociology. Edward Castronova, Associate Professor at Indiana University, has been studying these synthetic worlds and writes; .At this point we do not know how many people will eventually find a fantastical society preferable to our own. It could be hundreds of millions. (par. 5). While the social repercussions of such a prediction may invoke a plethora of reactions, there is one known benefit: it makes the RPG.s environment that much more like the real world. That is to say, in a world packed with so many humans, and because of the socially detached nature of the Internet, one is certain to see some of the conditions above, like abusive power and violent opportunism, present themselves. The despots and dictators of the real world take form as all-powerful gods in these fantasy worlds, which, for the purpose of a vagabond agent, makes them perfect candidates for avoidance. In fact, since people tend to take these digital playgrounds so seriously, and the use of robots is frowned upon, everyone is worth avoiding. So the necessary conditions of the hostile environment are present in RPGs, which makes them ideal candidates for testing all sorts of reinforcement learning agents. In fact, the conditions are so well met that we can move on to the question of what type of RPG best suits the agents needs, and how the agent is to interact with it. MUDs or Multi User Dimensions are text based RPGs that work over telnet, which makes them good candidates for agent design. Still, it was necessary to establish some proof of concept code that allowed an agent to interface with these environments through the use of sensors and actuators. To reduce the requirements of writing protocol level code I have chosen to modify a mud client name Tinyfugue. Tinyfugue is written in C by Ken Keys and can be located at http://tf.tcp.com/~hawkeye/tf/. After close examination of the code I was able to wrap the beginning infrastructure of an agent into the receiving, sending, and user interface functions of Tinyfugue. Additionally, I have added threads that provoke information from the environment and periodically backup state information to a MySQL database. There are some more subtle implications to having a database backend, like the ability to save, load, and configure different agents at will, but the real key is the persistence of state information between invocations of the program. To see this proof of concept code you can visit http://chris.dod.net/vagabond/, and download the current version of Tinyfugue along with my thoughtfully named .Vagabond Godsflaw. agent patch. With the proof of concept work done it is now necessary to discuss reinforcement learning, some of its history, and how this potentially affects the vagabond agent.s design.

Stuart Russell and Peter Norvig argue that, .reinforcement learning might be considered to encompass all of AI [.]. (764). Because of this it will avail us to approach reinforcement learning from those areas that will be useful in designing a vagabond agent. John Stuart Mills, pioneer of utilitarian theory, states, .Actions are right in proportion as they tend to promote happiness; wrong as they tend to produce the reverse of happiness. (qtd. in White 34). This principal is ordinarily used when one wishes to evaluate the morality of a decision; however, there are some reverse implications that are worth examination, and may help draw some conclusions about the goal of reinforcement learning. Approaching Mills. principal as the subject of an action may help illuminate the implication that social actors expect right actions to befall them. That is, the utilitarian principal seems to imply that one prefers happiness in, and from, one.s environment, and seeks to avoid the opposite. By finding some calculus of environmental effects on an agent one can develop a heuristic method that approximates happiness and sadness. An agent that makes use of this method, formally referred to as a utility-based agent, must have some mapping to states, percept sequences, or prior actions. Some might argue that .During learning, the adaptive system tries some actions (i.e., output values) on its environment, then, it is reinforced by receiving a scalar evaluation (the reward) of its actions. (Perez-Uribe, par. 1). While not incorrect, this argument does fail to meet the requirement of the .hostile. environment, where any misfortune may befall the agent regardless of prior actions. The point here is that it may benefit the agent to apply scalar evaluations, not only to its prior actions, but also to percept sequences and states. With such a policy you could not only perform healthy actions, but also gravitate towards healthy states and take preemptive action to avoid unhealthy percept sequences. All of this discussion is relevant; however, once the abstraction is removed there is a very real problem of how the agent should look internally.

For complex environments, like those discussed above, it may be beneficial to design the agent as a collection of smaller sub-agents all performing localized tasks and feeding back some relevant heuristic information about each localized area. That is, the sub-agent in control of fighting may find it helpful to determine the best method of doing so by assigning a utility function that evaluates the effectiveness of certain attacks against certain adversaries. A real measure of success or failure is if the agent dies in the fight. In such a case it may be beneficial to adjust the heuristic of that map location, adversary, or attack sequence accordingly. Similarly, a sub-agent in charge of movement may find all directions in a room, but still periodically test the whole set of directional possibilities so as to find new options. This algorithm is what Stuart Russell and Peter Norvig, in their book .Artificial Intelligence a Modern Approach,. refer to as greedy in the limit of infinite exploration (GLIE). They suggest a good way to implement the GLIE algorithm is to .give some weight to actions that the agent has not tried very often, while tending to avoid actions that are believed to be of low utility. (774). Exploring the map may involve using a multi-connecting linked list or similar data structure and, perhaps, the use of Dijkstra.s single source shortest paths algorithm for direct navigation.

Andres Perez-Uribe, in his Introduction to Reinforcement Learning, discuses the SARSA algorithm and its application to maze learning. Unfortunately SARSA may not be useful for the vagabond agent because it seems to value some final goal state rather than the metric of basic exploration. For example, the implementation of SARSA on Andres Perez-Uribe.s page solves a maze-learning problem that tends to confirm my suspicions. As of now there appears only to be a bunch of bots that interact with others through conversation on the client side of MUDs. If there were bots out there that performed a similar task to what is proposed here, then it would be useful to see if they where implemented with any form of intelligent navigation and exploration systems. There are still many decisions that will shape the direction of development, but as things are now Vagabond Godsflaw stands a chance at leading its peers in the domain of MUDs.

 

Installing and Running Vagabond Godsflaw
 

Step One:
You must download the Tinyfugue client. You can either get Version 5.0 beta 6 from the Tinyfugue site, or you can download it from this site (below). You will also need the patch (below), MySQL, and GDSL. For those of you lucky enough to run Slackware Linux you can just download the package below.

Tinyfugue Version 5.0 beta 6 | MD5 | SIGN
Vagabond Godsflaw Patch | MD5 | SIGN
Patched Slackware Package | MD5 | SIGN

Step Two:
You must install MySQL into /usr/local, or edit the vars.mak.in and change the search location of your MySQL libraries and header files.

Step Three:
You must install GDSL into /usr/local, or edit the vars.mak.in and change the search location of your GDSL libraries and header files.

Step Four:
Unpack Tinyfugue Version 5.0 beta 6 with the command tar xvfz tf-50b6.tar.gz. This should make a directory called tf-50b6. Now change to that directory (cd tf-50b6), and run the command patch -p1 < ../tf-50b6-godsflaw.patch. This should apply the patch to the source code without a problem. NOW you may make any changes to vars.mak.in that may be required from above.

Step Five:
Compile and install the patched Tinyfugue client. You can do this by issuing the following commands from the Tinyfugue source directory (./configure ; make ; make install). Hopefully you have no trouble compiling the source.

Step Six:
Now that everything is installed we have some configuring to do. Make sure you have the MySQL database name, username, and password to access your MySQL account. If you do not already have access to a MySQL account, either ask your administrator to give you access, or set it up yourself by consulting the MySQL manual. Once you have access to your database (we shall call it foobar), then you must configure your MySQL tables. You can do this by downloading the following files and running the commands below:

mysql -u username -p foobar < create_bots.sql
mysql -u username -p foobar < create_healing.sql
mysql -u username -p foobar < create_provoke_env.sql
mysql -u username -p foobar < create_triggers.sql

where username and foobar are your username and database name respectively. Also note the -p flag. This will prompt you for your MySQL account password, which you should enter to login.

Step Seven:
Now we need to configure Tinyfugue to use your MySQL database. In the file .tfrc, which you will locate in your home directory, change the values for username, password, and database to your MySQL username, password, and database.

Step Eight:
You should be ready to type tf to start Tinyfugue. Once your in issue the command /godsflaw -h to get a help message:

<PRE>
%
% Usage: /godsflaw [-bhls] [-c] [-o]
%        [-d] [-p""] [-P]
%        [-T]
%        [-t"" -a -A""]
%        [-t"" -a -f"" -v""]
%
% NOTE: leave NO spaces between option and argument
%
% -h    print this usage message
% -l    list current bots with names and bot_id
% -o    turn a bot with bot_id on or off (see -l)
%
% NOTE: bot must be ON for following commands:
%
% -p    add provocation string to bot ""
% -P    remove provocation from bot 
% -s    list all provocations
% -T    remove trigger from bot 
% -b    list all of bot's triggers
% -t    add trigger string to bot ""
% -a    set agent for some action trigger 
% -A    add action string to bot ""
% -f    add format string to bot ""
% -v    add variables for format string ""
%
% NOTE: bot must be OFF for following commands:
%
% -c    create new bot with 
% -d    delete bot with 
%                                                                              
</PRE>
Since you don't yet have a bot you can create one with the command /godsflaw -c walfsdog, where walfsdog is whatever you would like you bot's name to be. Now you can list the active bots with the command /godsflaw -l:
<PRE>
% +-------------------------------------------------------------+
% | bot_id | Bot's Name                                         |
% +-------------------------------------------------------------+
% | 1      | walfsdog                                           |
% +-------------------------------------------------------------+
</PRE>
Once there is a bot listed there you can turn it on with the command /godsflaw -o1.
<PRE>
% Initializing godsflaw:
% - Allocating variables and data structures:
% - Getting MySQL connection variables
% - Loading data structures from MySQL.
% - Starting all agent functions.
% - godsflaw bot enabled                                                       
% - Dumping Data to MySQL.
</PRE>
Now you will need to add environmental specific actuators and sensors to the bots database. One such actuator comes in the form of a provocation. To add a provocation type the command /godsflaw -p"STATUS", where STATUS is the name of the command you would like to periodically provoke the environment. The intention of provocation is to allow the environment to give you information. If for some reason the bot needs to know if it is day or night, there is most likely some command to provide this information. You can list all the active provocation with the command /godsflaw -s:
<PRE>
% +-------------------------------------------------------------+
% | provoke_id | Provocation                                    |
% +-------------------------------------------------------------+
% | 1          | INFO INV                                       |
% | 2          | STATUS                                         |
% | 3          | DEFENCES                                       |
% | 4          | SCORE                                          |
% | 5          | WIELDED                                        |
% | 6          | AB                                             |
% | 7          | falcon assess                                  |
% | 8          | WEATHERING                                     |
% +-------------------------------------------------------------+              
</PRE>
Adding triggers can be much more complicated, so pay close attention to the example below. To list all the bot's triggers you can type /godsflaw -b. Here are some sample triggers that we shall refer to later:
<PRE>
% +-------------------------------------------------------------+
% trigger id:  1
% agent:  1
% trigger:  H:
% format string:  H:%s M:%s E:%s W:%s %*s
% variable position 0:  5
% variable position 1:  6
% variable position 2:  7
% variable position 3:  8
% +-------------------------------------------------------------+
% trigger id:  2
% agent:  1
% trigger:  Health:
% format string:  Health: %*d/%s  Mana: %*d/%s
% variable position 0:  1
% variable position 1:  2
% +-------------------------------------------------------------+
% trigger id:  3
% agent:  1
% trigger:  Endurance:
% format string:  Endurance: %*d/%s  Willpower: %*d/%s
% variable position 0:  3
% variable position 1:  4
% +-------------------------------------------------------------+
% trigger id:  4
% agent:  0
% trigger:  [eb]
% format string:  H:%*d M:%*d E:%*d W:%*d [%1s%1s]
% variable position 0:  9
% variable position 1:  10
% +-------------------------------------------------------------+
% trigger id:  5
% agent:  0
% trigger:  [e-]
% format string:  H:%*d M:%*d E:%*d W:%*d [%1s%1s]
% variable position 0:  9
% variable position 1:  10
% +-------------------------------------------------------------+
% trigger id:  6
% agent:  0
% trigger:  [-b]
% format string:  H:%*d M:%*d E:%*d W:%*d [%1s%1s]
% variable position 0:  9
% variable position 1:  10
% +-------------------------------------------------------------+
% trigger id:  7
% agent:  0
% trigger:  [--]
% format string:  H:%*d M:%*d E:%*d W:%*d [%1s%1s]
% variable position 0:  9
% variable position 1:  10
% +-------------------------------------------------------------+
% trigger id:  8
% agent:  0
% trigger:  [p eb]
% format string:  H:%*d M:%*d E:%*d W:%*d [%1s %1s%1s]
% variable position 0:  11
% variable position 1:  9
% variable position 2:  10
% +-------------------------------------------------------------+
% trigger id:  9
% agent:  0
% trigger:  [p e-]
% format string:  H:%*d M:%*d E:%*d W:%*d [%1s %1s%1s]
% variable position 0:  11
% variable position 1:  9
% variable position 2:  10
% +-------------------------------------------------------------+
% trigger id:  10
% agent:  0
% trigger:  [p -b]
% format string:  H:%*d M:%*d E:%*d W:%*d [%1s %1s%1s]
% variable position 0:  11
% variable position 1:  9
% variable position 2:  10
% +-------------------------------------------------------------+
% trigger id:  11
% agent:  0
% trigger:  [p --]
% format string:  H:%*d M:%*d E:%*d W:%*d [%1s %1s%1s]
% variable position 0:  11
% variable position 1:  9
% variable position 2:  10
% +-------------------------------------------------------------+
% trigger id:  12
% agent:  0
% trigger:  [- eb]
% format string:  H:%*d M:%*d E:%*d W:%*d [%1s %1s%1s]
% variable position 0:  11
% variable position 1:  9
% variable position 2:  10
% +-------------------------------------------------------------+
% trigger id:  13
% agent:  0
% trigger:  [- e-]
% format string:  H:%*d M:%*d E:%*d W:%*d [%1s %1s%1s]
% variable position 0:  11
% variable position 1:  9
% variable position 2:  10
% +-------------------------------------------------------------+
% trigger id:  14
% agent:  0
% trigger:  [- -b]
% format string:  H:%*d M:%*d E:%*d W:%*d [%1s %1s%1s]
% variable position 0:  11
% variable position 1:  9
% variable position 2:  10
% +-------------------------------------------------------------+
% trigger id:  15
% agent:  0
% trigger:  [- --]
% format string:  H:%*d M:%*d E:%*d W:%*d [%1s %1s%1s]
% variable position 0:  11
% variable position 1:  9
% variable position 2:  10
% +-------------------------------------------------------------+
% trigger id:  16
% agent:  0
% trigger:  feeling deliciously well-rested
% action:  stand
% +-------------------------------------------------------------+
% trigger id:  17
% agent:  0
% trigger:  You have been slain
% action:  APPROACH MIRROR
% +-------------------------------------------------------------+
% trigger id:  18
% agent:  0
% trigger:  REJECT GRACE
% action:  REJECT GRACE
% +-------------------------------------------------------------+
% trigger id:  19
% agent:  0
% trigger:  WARES
% action:  WARES
% +-------------------------------------------------------------+
% trigger id:  20
% agent:  0
% trigger:  Ratman stands here quietly
% action:  sell rat to Ratman
% +-------------------------------------------------------------+
% trigger id:  21
% agent:  0
% trigger:  You have slain a baby rat
% action:  get rat
% +-------------------------------------------------------------+
% trigger id:  22
% agent:  0
% trigger:  You have slain a young rat
% action:  get rat
% +-------------------------------------------------------------+
% trigger id:  23
% agent:  0
% trigger:  You have slain an old rat
% action:  get rat
% +-------------------------------------------------------------+
% trigger id:  24
% agent:  0
% trigger:  You have slain a rat
% action:  get rat
% +-------------------------------------------------------------+
% trigger id:  25
% agent:  0
% trigger:  You have slain a black rat
% action:  get rat
% +-------------------------------------------------------------+
% trigger id:  26
% agent:  0
% trigger:  The bright sun shines down upon you
% action:  falcon train in stamina
% +-------------------------------------------------------------+
% trigger id:  27
% agent:  0
% trigger:  Type MORE to continue
% action:  more
% +-------------------------------------------------------------+
% trigger id:  28
% agent:  0
% trigger:  The night sky is clear
% action:  falcon train in strength
% +-------------------------------------------------------------+
</PRE>
Let's look at the last trigger (28). This is the most basic form of trigger used to perform some action when the environment behaves in some way. That is, for trigger 28, when the environment says "The night sky is clear" the bot will respond with "falcon train in strength". Notice also that the agent id is 0. This means the trigger is not handled by any particular agent. One could add such a trigger with the command /godsflaw -t"The night sky is clear" -a0 -A"falcon train in strength". Now let us examine a more complicated trigger. The other form of trigger works for either an agent or the general bot by parsing the line that the trigger is activated for, and filling some internal variables. Adding these triggers can be very dangerous, since one must get the format string and variables exactly right. First let us look at the output of /godsflaw -v:
<PRE>
% - This is a string of variable IDs that
% - will be used with for format option.
% - Each ID should be seperated by a space
% - and there should be no more than 10
% - IDs.  The option should look like this:
% - -v"8 6 5 3 2 1"
% - You will find ID to variable mappings
% - below.  NOTE: A means the agents that
% - you would pass on the command line with
% - the -a option.  You MUST not cross
% - agent IDs in the variable string.  That
% - is, -v"8 9" would be BAD
% - given the table below, because 8 and 9
% - are handled by different agents.  The T
% - field is the type this variable will be
% - converted to.  So ID 3 below will be
% - changed to and integer number.  That
% - means if you get something other than
% - a number from the format string into
% - variable 3, expect unpredictable
% - results.
% - ---------------------------------------
% - | ID | A | T |Description             |
% - ---------------------------------------
% - | 1  | 1 | i | max health             |
% - | 2  | 1 | i | max mana               |
% - | 3  | 1 | i | max endurance          |
% - | 4  | 1 | i | max willpower          |
% - | 5  | 1 | i | current health         |
% - | 6  | 1 | i | current mana           |
% - | 7  | 1 | i | current endurance      |
% - | 8  | 1 | i | current willpower      |
% - | 9  | 0 | s | equilibrium            |
% - | 10 | 0 | s | balance                |
% - | 11 | 0 | s | prone                  |
% - ---------------------------------------
</PRE>
The -v option tells us information about the variables and who they belong to. It is very important that one fills all these variables for the bot to perform properly. The ID field just specifies what the variable ID is for each variable. When we add a trigger with variables we will care about these values. The A field tells us which agent the ID belongs to. Be careful to read the warning above. The T field tells us what type of data this is stored as in the bot. That is, if ID 1 tells us its type is (i) we know the data is stored as an integer and should not allow a string or character to pollute that variable. Since we know the agent each variable belongs to we can talk about the -a option: /godsflaw -a
<PRE>
% - ----------------------------------
% - You must choose an agent by number
% - This agent will handle the trigger
% - ----------------------------------
% - 0 = general agent
% - 1 = healing agent
% - 2 = movement agent
% - 3 = inventory agent
% - 4 = fighting agent
% - ----------------------------------                                         
</PRE>
Basically the table explains it all. Each agent has an ID and this is how we tell our trigger that is belongs to some sub-agent. The -f option is VERY helpful in understanding how to cook the format string: /godsflaw -f:
<PRE>
% - This should take the form of a standard
% - SCANF(3) format string.  Yes this is
% - very insecure because you can cook the
% - format string, but I really had no time
% - to come up with anything better. NOTE:
% - you can do a man (i.e. man 3C scanf)
% - to find out more on the format string.
% - Your format string can only have up to
% - 10 varaibles that will be specified
% - with the -v option.  These varaibles
% - must all be of type string "%s" and
% - no longer than 32 in size.                                                 
</PRE>
Now that we know all of this we can put together a trigger like that of ID (1) above. To add a trigger like this we would use the command /godsflaw -t"H:" -a1 -v"5 6 7 8" -f"H:%s M:%s E:%s W:%s %*s". It should be noted that all format strings ONLY take %s conversion characters. There is a very real technical reason for this, but I will not go into it here. One more interesting thing. Notice the %*s format specifier. This is the equivalent of a "don't care." It will not fill any variable, just help the format string match more accurately. Hopefully this gets you started or at least gives you an idea of how things work.

 

Development and Progress Update 11/08/2004
 

Progress:

The Vagabond Godsflaw project is finished with the framework development stage. In order to minimize the amount of work building infrastructure for TCP/IP connections and basic client server message passing, Vagabond Godsflaw has been built on top of the screen oriented MUD client TinyFugue http://tf.tcp.com/~hawkeye/tf/. The TinyFugue client is written in C, which means all prototype code is also in C. In addition I have chosen the world of Aetolia (http://www.aetolia.com/) as the primary environment for testing the agent.

To interface with the environment one must control some actuators and sensors. All I needed to do was find key locations in the TinyFugue code that would allow me to monitor incoming and inject outgoing traffic, so that I could call my own sensor or actuator functions respectively. Additionally I needed ways for the user to issue commands to the agent so that he or she could populate necessary data structures. TinyFugue already had an argument handling structure, which made this relatively easy. Once these three interfaces with the TinyFugue client were finished I began development on Vagabond Godsflaw.s basic framework.

The basic framework of the agent needed to allow for multiple sub-agent threads feeding off of shared data structures that would be persistent across invocations of the program. For this first part I used POSIX threads to spawn off each agent and a few key functions like the ticker and MySQL dump function. The command to turn the agent on populates the data structures with persistent information from the MySQL database tables, and spawns each of the sub-agent threads. There are four sub-agent threads that collectively make the whole agent, which shall be discussed in detail later. Each thread must access global data that could be populated by the sensor, used by the actuator, or shared between sub-agents. There were, and will continue to be, extensive concurrency locking concerns during development. For example, when one of the agents wishes to send a command to the environment it must populate a queue that the actuator thread uses. Once the queue is populated the agent sends a broadcast that the queue has data in it, and the actuator thread pulls the data out of the queue and sends it to the environment. Unfortunately, the process is a bit more complicated since the actuator cannot just hammer the environment with commands. That is, under certain conditions, the actuator must wait for the environment to signal that it is ready for the command and then it can send it. The threads concurrency considerations are rivaled only by the difficulty of debugging their runtime errors. For this reason much careful consideration has gone into the design and will go into the future code for these threads. The data structures are another matter all together.

While it is perhaps a given that the Java or C++ programmer will use the API or STL, it is not so clear what the C programmer is going to do for standard data structures. I have found a great resource called the Generic Data Structure Library (GDSL) for C. The GDSL project can be found at http://www.nongnu.org/gdsl/, and I intend to use it for all of my basic data structure needs. That is, I have already employed it for linked lists, queues, and hash tables. Unfortunately, there seems to be no priority queue in the library, which may mean I need to modify it myself for this functionality. Since the priority queue may become an issue in the movement agent.s A* search, I may have no other choice. As of now the agents data is held in specific structures that are stored in these generic data structures. For example, there is a provocation structure that is stored in a linked list. The user may populate this structure to send certain provocations to the environment, which may provoke the environment to send useful information back. This data is stored in a persistent database every so many seconds, and can be accessed by multiple threads. Since the GDSL is reentrant, all one must do is lock mutexes before and after access of the data structure. Since data structures are a basic subject let us move on to the question of persistent data.

On each invocation of the agent, its data from the previous runs should be persistent. For this reason I have employed the use of a MySQL http://www.mysql.com/ database for data structure storage. That is, for actual runtime usage all data comes from basic data structures for fast algorithms and access, but persistent storage is handled, not by flat files, but MySQL. The need for persistent storage stems from the problem that the environment is so complicated it requires user configuration. Additionally, since the program is designed to use different environments it seemed only fitting to let the user configure each agent.s actuators and sensors. The basic layout of the databases is shown in figure 1:

<PRE>
mysql> show tables;     
+----------------+
| Tables_in_bots |
+----------------+
| bots           |
| healing        |
| provoke_env    |
| triggers       |
+----------------+
4 rows in set (0.00 sec)
                       
mysql> describe bots;
+--------+------------------+------+-----+---------+----------------+
| Field  | Type             | Null | Key | Default | Extra          |
+--------+------------------+------+-----+---------+----------------+
| bot_id | int(10) unsigned |      | PRI | NULL    | auto_increment |
| name   | varchar(64)      |      |     |         |                |
+--------+------------------+------+-----+---------+----------------+
2 rows in set (0.00 sec)
                       
mysql> describe healing;
+--------------+------------------+------+-----+---------+----------------+
| Field        | Type             | Null | Key | Default | Extra          |
+--------------+------------------+------+-----+---------+----------------+
| heal_id      | int(10) unsigned |      | PRI | NULL    | auto_increment |
| bot_id       | int(10) unsigned |      |     | 0       |                |
| target       | int(11)          |      |     | 0       |                |
| gain         | int(11)          |      |     | 0       |                |
| onlyInBattle | int(11)          |      |     | 0       |                |
| cmd          | text             |      |     |         |                |
+--------------+------------------+------+-----+---------+----------------+
6 rows in set (0.00 sec)






mysql> describe provoke_env;
+-------------+------------------+------+-----+---------+----------------+
| Field       | Type             | Null | Key | Default | Extra          |
+-------------+------------------+------+-----+---------+----------------+
| provoke_id  | int(10) unsigned |      | PRI | NULL    | auto_increment |
| bot_id      | int(10) unsigned |      |     | 0       |                |
| provocation | varchar(255)     |      |     |         |                |
+-------------+------------------+------+-----+---------+----------------+
3 rows in set (0.00 sec)

mysql> describe triggers;
+------------+------------------+------+-----+---------+----------------+
| Field      | Type             | Null | Key | Default | Extra          |
+------------+------------------+------+-----+---------+----------------+
| trigger_id | int(10) unsigned |      | PRI | NULL    | auto_increment |
| bot_id     | int(10) unsigned |      |     | 0       |                |
| agent      | int(11)          |      |     | 0       |                |
| trigger    | text             |      |     |         |                |
| action     | text             |      |     |         |                |
| format     | text             |      |     |         |                |
| var_01     | int(11)          |      |     | 0       |                |
| var_02     | int(11)          |      |     | 0       |                |
| var_03     | int(11)          |      |     | 0       |                |
| var_04     | int(11)          |      |     | 0       |                |
| var_05     | int(11)          |      |     | 0       |                |
| var_06     | int(11)          |      |     | 0       |                |
| var_07     | int(11)          |      |     | 0       |                |
| var_08     | int(11)          |      |     | 0       |                |
| var_09     | int(11)          |      |     | 0       |                |
| var_10     | int(11)          |      |     | 0       |                |
+------------+------------------+------+-----+---------+----------------+
16 rows in set (0.00 sec)

Figure 1
</PRE>
There will be more tables in the future, but figure 1 should give a good idea of the type of information being stored. Notice that each table has a bot_id field. This field is so the user can create and configure multiple bots, which he or she can load at will. So if the user is testing multiple agents in one or many worlds, this design consideration is important. This does not mean one can run multiple bots from the same runtime instance of the program, but one can run multiple programs at the same time. This design consideration is an important feature for testing the agent, which shall be discussed later on. When the user sends the ./godsflaw .o1. command it will check the current state of the agent and turn it on or off. On initialization the agent will load all stored information into the data structures and set up a sub-agent that listens for broadcasts. There is also an agent that sends broadcasts every so many seconds. Each time the dump agent receives a broadcast it will empty every persistent data structure to its respective table. This concludes discussion of the persistent data storage, but raises the interesting consideration of the agents command infrastructure.

Commands are handled through the normal TinyFugue channels as discussed above. The interesting question is not how the command channel is implemented, but what commands can the user run? Figure 2 contains the output of the ./godsflaw .h. command:
<PRE>
% Usage: /godsflaw [-bhls] [-c] [-o]
%        [-d] [-p""] [-P]   
%        [-T]                      
%        [-t"" -a -A""]
%        [-t"" -a -f"" -v""]
%                                                    
% NOTE: leave NO spaces between option and argument
%                                                  
% -h    print this usage message
% -l    list current bots with names and bot_id
% -o    turn a bot with bot_id on or off (see -l)
%                                                
% NOTE: bot must be ON for following commands:
%                                             
% -p    add provocation string to bot ""
% -P    remove provocation from bot 
% -s    list all provocations
% -T    remove trigger from bot 
% -b    list all of bot's triggers
% -t    add trigger string to bot ""
% -a    set agent for some action trigger 
% -A    add action string to bot ""
% -f    add format string to bot ""
% -v    add variables for format string ""
%
% NOTE: bot must be OFF for following commands:
%
% -c    create new bot with 
% -d    delete bot with 

Figure 2
</PRE>
As can be seen in figure 2 there are commands that work when the agent is off and when the agent is on. When the agent is off one can list, create, or remove an agent. If one lists an agent he or she can use the bot_id to turn that agent on or off. Once the bot is on one can list, add, or remove a provocation, or list, add, or remove a trigger. It is the latter option that is quite complex and necessary to understand for a thorough treatment of the sub-agents.

Triggers are the mechanism through which the user can get the agent to act. When the agent observes a trigger it must take some action that the user specifies. The most basic trigger, for completeness, is the trigger/action pair. The command may look like ./godsflaw .t.you have fallen. .a0 .A.stand.., which tells the default agent that when it sees the string .you have fallen. to send the command .stand.. There is really no interesting AI application here, so we will move on to the more interesting trigger. The next trigger tells some sub-agent to capture information from the environment and fill variables with it. For example, the command

./godsflaw .t.Your health is. .a1 .f.Your health is %s. .v.5.. will tell the health agent to fill the current health variable. One can see the options to the .v and .a arguments in figure 3 by issuing the command ./godsflaw .v. and ./godsflaw .a. respectively.

<PRE>
/godsflaw -v
% - This is a string of variable IDs that
% - will be used with for format option.
% - Each ID should be seperated by a space
% - and there should be no more than 10
% - IDs.  The option should look like this:
% - -v"8 6 5 3 2 1"
% - You will find ID to variable mappings
% - below.  NOTE: A means the agents that
% - you would pass on the command line with
% - the -a option.  You MUST not cross
% - agent IDs in the variable string.  That
% - is, -v"8 9" would be BAD
% - given the table below, because 8 and 9
% - are handled by different agents.  The T
% - field is the type this variable will be
% - converted to.  So ID 3 below will be
% - changed to and integer number.  That
% - means if you get something other than
% - a number from the format string into
% - variable 3, expect unpredictable
% - results.
% - ---------------------------------------
% - | ID | A | T |Description             |
% - ---------------------------------------
% - | 1  | 1 | i | max health             |
% - | 2  | 1 | i | max mana               |
% - | 3  | 1 | i | max endurance          |
% - | 4  | 1 | i | max willpower          |
% - | 5  | 1 | i | current health         |
% - | 6  | 1 | i | current mana           |
% - | 7  | 1 | i | current endurance      |
% - | 8  | 1 | i | current willpower      |
% - | 9  | 0 | s | equilibrium            |
% - | 10 | 0 | s | balance                |
% - | 11 | 0 | s | prone                  |
% - ---------------------------------------

/godsflaw -a
% - ----------------------------------
% - You must choose an agent by number
% - This agent will handle the trigger
% - ----------------------------------
% - 0 = general agent
% - 1 = healing agent
% - 2 = movement agent
% - 3 = inventory agent
% - 4 = fighting agent
% - ----------------------------------

Figure 3
</PRE>
The trigger command above is important for filling each agent.s variables. The assumption is that every world has some generic set of commands, which should fill these variables. Once one fills the variables for each agent, that agent can perform within the environment. This leads us into the sub-agent discussion.

Before we go into the sub-agent discussion it is worth mentioning that the infrastructure seems to be behaving properly. That is, any future data structures or user configurable commands should b relatively easy to drop in place, which should leave me more time for agent development. That being said; much of my time has been spent thinking about how to build a solid infrastructure, and less on each agent.s implementation. I am half way through the healing agent.s design and testing, as will be discussed below, but I have very few ideas on how the inventory and fighting agents are going to work. The problem I face here is that, while there are parallels between the behaviors for the healing, inventory, and fighting agent, I am not so sure the reinforcement learning work I have done so far is sufficient. That is, before exploration of reinforcement learning, if I where to have taken a similar project on, I.m not so sure I would have done it any differently. I must stress that I only feel this way with regards to the healing agent, which I would like to fix in the fighting and inventory agent.

While it is the healing agent I chose to develop first, it is the movement agent, which will pull everything together. The environment contains different rooms that the agent must detect. This is the key problem the movement agent faces. As humans we can pattern match blocks of text and tell when we are in a room, but for the agent to do this is much more difficult. Say the movement agent walks into some room by issuing the command .south.. Once the agent puts the command in the actuator queue there is no guarantee that the next sensor the agent gets from the environment will be room title. That is, each state in this environment will be indexed through a hash table by the name of the room, but the agent cannot be sure of that name. After much consideration on the matter I have decided that the best way for an unsure human to check which room he or she is in would be to issue the command again. So for the agent there must be some threshold (i.e. best 3 out of 5), that tells it for sure which room it.s in. So after issuing the command X times, if Y of those times are greater than X . Y, then we know which state we are in, and can look it up in the hash table. There will be a user specified list of movement commands that will be tried in each room, and once all commands have been tried there will be some random chance for the movement agent to try dead choices over again. It also seems necessary for the movement agent to allow for different types of movement. Perhaps the agent wants to find a shop to buy health items, or the best area to battle in, or just explore the environment. The first two situations seem perfect for an A* search, while there are still some interesting considerations for how one should just blindly explore the environment. There may even be a tracking mode that the agent can be put into. These are all worthwhile areas of thought when making the movement agent, but my head is still with the health agent.

For the health agent I have already completed much of the work. In the environment if I gain or loose health the agent can detect that. If I loose health for instance, the agent evaluates if I am in danger of dieing and will heal me if needed. Each healing command belongs to one of two sets: either it can or cannot be used in battle. If the agent is in battle it will not use one of the non-battle commands. Each command has a field called gain, as can be seen above. This field records the last gain in health the command provided. This way we can get a fairly accurate measure of how much a single command can heal. This is then compared against all other commands. If this command will give the largest gain in health without capping the max-health it is chosen. The other elements of the health agent refill mana and other stats. The knowledge of loss in health or death can be used to update the state information so the agent can learn the best areas to travel and fight. This type of information will be incorporated into some heuristic that works for the type of movement the agent is currently performing. For example, the healing agent may tell the movement agent to flee battle. The movement agent should, however, be smart enough not to go into a more dangerous area while fleeing.

The inventory and fighting agent have some interesting considerations of their own. Nevertheless, I am still thinking of these agents as secondary to the primary project. That is, I am focusing my effort on the healing and movement agents so as not to overwhelm myself. If I get to these in time I would like to address the concerns above. Namely, I want a thorough and fair evaluation of the best reinforcement learning for this context. But enough of the framework and agent design, let.s discuss testing procedures.

Testing and Evaluation:

I shall address the two issues, testing and evaluation, separately. For testing there are a few approaches that should yield good results. First, TinyFugue is built with a good debugging facility. That is, when run from GDB, if one wishes to reproduce an error that may lead to a fatal runtime bug, then TinyFugue seems to catch the exception and inform the user of the GDB command to generate a stack trace. This information has been quite useful in catching those catastrophic runtime failures. In addition to this I am planning on adding, if it does not already exists, a logging facility for all incoming and outgoing data. This should allow the bot to run for extended periods of time, and if it behaves strangely in some situation, I can do a post-mortem on the logs. Perhaps the most useful ability I built in was that of running multiple bots. This way I can set up numerous bots, running from different invocations of the program, to evaluate a spectrum of their behaviors. This means I can catch some possibly poor choice in the code that may have been made too specific for the character the bot was playing at the time. I have also solicited two teenagers that live on the same island I do to test the bot. This means I will get user feedback, which should give my program a fair shake.

The second issue of evaluation should be taken care of through the agent.s ability to dump all data to the databases. That is, I can evaluate the improvements in the bots actions by looking at the data in the database. For example, I can see all the state information and how many times the bot dies, so I can record the information at the beginning of some period of time, say before I go to sleep, with that of some other period, say when I wake up. Ultimately there are many dimensions by which the bot can improve. Finding new areas of the map is one, not dieing is two, taking less time in battle is three, and choosing the optimal attacks is four. There is an extremely large set of actions that may be used as a metric for the agent.s success. Again it would seem best to segments these into smaller sections for analysis. Since the healing agent is concerned with a different function than the fighting agent it makes sense to judge their success differently. That being said, since it is not yet clear that there will be time for the fighting and inventory agent, I will focus on those areas of interest to the healing and movement agent.

The best metric for measuring the success of the healing agent and movement agent collectively is how often the agent dies. This should be awful at first but improve substantially with time. This means that the health agent is healing more effectively and the movement agent is avoiding danger areas better. There are, of course, different considerations for each agent, but this is the ultimate evaluation. Again collecting this information over time, perhaps in an automated fashion, would be the best metric.

 

TODO List and Prototype Information
 

Healing Agent:
For the health agent I have already completed much of the work. In the environment if I gain or loose health the agent can detect that. If I loose health for instance, the agent evaluates if I am in danger of dieing and will heal me if needed. Each healing command belongs to one of two sets: either it can or cannot be used in battle. If the agent is in battle it will not use one of the non-battle commands. Each command has a field called gain, as can be seen above. This field records the last gain in health the command provided. This way we can get a fairly accurate measure of how much a single command can heal. This is then compared against all other commands. If this command will give the largest gain in health without capping the max-health it is chosen. The other elements of the health agent refill mana and other stats. The knowledge of loss in health or death can be used to update the state information so the agent can learn the best areas to travel and fight. This type of information will be incorporated into some heuristic that works for the type of movement the agent is currently performing. For example, the healing agent may tell the movement agent to flee battle. The movement agent should, however, be smart enough not to go into a more dangerous area while fleeing. It should be noted that the healing agent also acts as one big utility function. That is, one of the best measures of the utility of some particular state, is if the agent frequently dies in that state, or perhaps looses way too much health.

Movement Agent:
The environment contains different rooms that the agent must detect. This is the key problem the movement agent faces. As humans we can pattern match blocks of text and tell when we are in a room, but for the agent to do this is much more difficult. Say the movement agent walks into some room by issuing the command south. Once the agent puts the command in the actuator queue there is no guarantee that the next sensor the agent gets from the environment will be room title. That is, each state in this environment will be indexed through a hash table by the name of the room, but the agent cannot be sure of that name. After much consideration on the matter I have decided that the best way for an unsure human to check which room he or she is in would be to issue the command again. So for the agent there must be some threshold (i.e. best 3 out of 5), that tells it for sure which room it.s in. So after issuing the command X times, if Y of those times are greater than X . Y, then we know which state we are in, and can look it up in the hash table. There will be a user specified list of movement commands that will be tried in each room, and once all commands have been tried there will be some random chance for the movement agent to try dead choices over again. It also seems necessary for the movement agent to allow for different types of movement. Perhaps the agent wants to find a shop to buy health items, or the best area to battle in, or just explore the environment. The first two situations seem perfect for an A* search, while there are still some interesting considerations for how one should just blindly explore the environment. For this latter condition I will use a Q-learning algorithm where the reward() function is based on that state's collective utility as set by other agents. This means the healing, inventory, and fighting agents will update some utility value and the movement agent will periodically run some thread to train it as to the updated state space. This way the agent can learn a dynamic environment. Furthermore the epsilon greedy value should be about 50% since we know we can be fairly confident about the state space, but we are guaranteed it will change as the agent unlocks new locations and the administrators add new levels. There doesn't appear to be to heavy a penalty for this value.

 

Works Cited
  Castronova, Edward. September 2, 2004. Indiana University. September 10, 2004
http://mypage.iu.edu/~castro/home.html.

Perez-Uribe, Andres. Introduction to Reinforcement Learning. September 10, 2004
http://lslwww.epfl.ch/~aperez/RL/RL.html

Russell, Stuart and Norvig, Peter. Artificial Intelligence: A Modern Approach (Second Edition). Prentice Hall, 2003.

White, James. Contemporary Moral Problems (Seventh Edition). Wadsworth, 2003