|
|
Progress:
The Vagabond Godsflaw project is finished with the framework development stage. In order to minimize the amount of work building infrastructure for TCP/IP connections and basic client server message passing, Vagabond Godsflaw has been built on top of the screen oriented MUD client TinyFugue http://tf.tcp.com/~hawkeye/tf/. The TinyFugue client is written in C, which means all prototype code is also in C. In addition I have chosen the world of Aetolia (http://www.aetolia.com/) as the primary environment for testing the agent.
To interface with the environment one must control some actuators and sensors. All I needed to do was find key locations in the TinyFugue code that would allow me to monitor incoming and inject outgoing traffic, so that I could call my own sensor or actuator functions respectively. Additionally I needed ways for the user to issue commands to the agent so that he or she could populate necessary data structures. TinyFugue already had an argument handling structure, which made this relatively easy. Once these three interfaces with the TinyFugue client were finished I began development on Vagabond Godsflaw.s basic framework.
The basic framework of the agent needed to allow for multiple sub-agent threads feeding off of shared data structures that would be persistent across invocations of the program. For this first part I used POSIX threads to spawn off each agent and a few key functions like the ticker and MySQL dump function. The command to turn the agent on populates the data structures with persistent information from the MySQL database tables, and spawns each of the sub-agent threads. There are four sub-agent threads that collectively make the whole agent, which shall be discussed in detail later. Each thread must access global data that could be populated by the sensor, used by the actuator, or shared between sub-agents. There were, and will continue to be, extensive concurrency locking concerns during development. For example, when one of the agents wishes to send a command to the environment it must populate a queue that the actuator thread uses. Once the queue is populated the agent sends a broadcast that the queue has data in it, and the actuator thread pulls the data out of the queue and sends it to the environment. Unfortunately, the process is a bit more complicated since the actuator cannot just hammer the environment with commands. That is, under certain conditions, the actuator must wait for the environment to signal that it is ready for the command and then it can send it. The threads concurrency considerations are rivaled only by the difficulty of debugging their runtime errors. For this reason much careful consideration has gone into the design and will go into the future code for these threads. The data structures are another matter all together.
While it is perhaps a given that the Java or C++ programmer will use the API or STL, it is not so clear what the C programmer is going to do for standard data structures. I have found a great resource called the Generic Data Structure Library (GDSL) for C. The GDSL project can be found at http://www.nongnu.org/gdsl/, and I intend to use it for all of my basic data structure needs. That is, I have already employed it for linked lists, queues, and hash tables. Unfortunately, there seems to be no priority queue in the library, which may mean I need to modify it myself for this functionality. Since the priority queue may become an issue in the movement agent.s A* search, I may have no other choice. As of now the agents data is held in specific structures that are stored in these generic data structures. For example, there is a provocation structure that is stored in a linked list. The user may populate this structure to send certain provocations to the environment, which may provoke the environment to send useful information back. This data is stored in a persistent database every so many seconds, and can be accessed by multiple threads. Since the GDSL is reentrant, all one must do is lock mutexes before and after access of the data structure. Since data structures are a basic subject let us move on to the question of persistent data.
On each invocation of the agent, its data from the previous runs should be persistent. For this reason I have employed the use of a MySQL http://www.mysql.com/ database for data structure storage. That is, for actual runtime usage all data comes from basic data structures for fast algorithms and access, but persistent storage is handled, not by flat files, but MySQL. The need for persistent storage stems from the problem that the environment is so complicated it requires user configuration. Additionally, since the program is designed to use different environments it seemed only fitting to let the user configure each agent.s actuators and sensors. The basic layout of the databases is shown in figure 1:
<PRE>
mysql> show tables;
+----------------+
| Tables_in_bots |
+----------------+
| bots |
| healing |
| provoke_env |
| triggers |
+----------------+
4 rows in set (0.00 sec)
mysql> describe bots;
+--------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------+------------------+------+-----+---------+----------------+
| bot_id | int(10) unsigned | | PRI | NULL | auto_increment |
| name | varchar(64) | | | | |
+--------+------------------+------+-----+---------+----------------+
2 rows in set (0.00 sec)
mysql> describe healing;
+--------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+------------------+------+-----+---------+----------------+
| heal_id | int(10) unsigned | | PRI | NULL | auto_increment |
| bot_id | int(10) unsigned | | | 0 | |
| target | int(11) | | | 0 | |
| gain | int(11) | | | 0 | |
| onlyInBattle | int(11) | | | 0 | |
| cmd | text | | | | |
+--------------+------------------+------+-----+---------+----------------+
6 rows in set (0.00 sec)
mysql> describe provoke_env;
+-------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+------------------+------+-----+---------+----------------+
| provoke_id | int(10) unsigned | | PRI | NULL | auto_increment |
| bot_id | int(10) unsigned | | | 0 | |
| provocation | varchar(255) | | | | |
+-------------+------------------+------+-----+---------+----------------+
3 rows in set (0.00 sec)
mysql> describe triggers;
+------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+------------------+------+-----+---------+----------------+
| trigger_id | int(10) unsigned | | PRI | NULL | auto_increment |
| bot_id | int(10) unsigned | | | 0 | |
| agent | int(11) | | | 0 | |
| trigger | text | | | | |
| action | text | | | | |
| format | text | | | | |
| var_01 | int(11) | | | 0 | |
| var_02 | int(11) | | | 0 | |
| var_03 | int(11) | | | 0 | |
| var_04 | int(11) | | | 0 | |
| var_05 | int(11) | | | 0 | |
| var_06 | int(11) | | | 0 | |
| var_07 | int(11) | | | 0 | |
| var_08 | int(11) | | | 0 | |
| var_09 | int(11) | | | 0 | |
| var_10 | int(11) | | | 0 | |
+------------+------------------+------+-----+---------+----------------+
16 rows in set (0.00 sec)
Figure 1
</PRE>
There will be more tables in the future, but figure 1 should give a good idea of the type of information being stored. Notice that each table has a bot_id field. This field is so the user can create and configure multiple bots, which he or she can load at will. So if the user is testing multiple agents in one or many worlds, this design consideration is important. This does not mean one can run multiple bots from the same runtime instance of the program, but one can run multiple programs at the same time. This design consideration is an important feature for testing the agent, which shall be discussed later on. When the user sends the ./godsflaw .o1. command it will check the current state of the agent and turn it on or off. On initialization the agent will load all stored information into the data structures and set up a sub-agent that listens for broadcasts. There is also an agent that sends broadcasts every so many seconds. Each time the dump agent receives a broadcast it will empty every persistent data structure to its respective table. This concludes discussion of the persistent data storage, but raises the interesting consideration of the agents command infrastructure.
Commands are handled through the normal TinyFugue channels as discussed above. The interesting question is not how the command channel is implemented, but what commands can the user run? Figure 2 contains the output of the ./godsflaw .h. command:
<PRE>
% Usage: /godsflaw [-bhls] [-c] [-o]
% [-d] [-p""] [-P]
% [-T]
% [-t"" -a -A""]
% [-t"" -a -f"" -v""]
%
% NOTE: leave NO spaces between option and argument
%
% -h print this usage message
% -l list current bots with names and bot_id
% -o turn a bot with bot_id on or off (see -l)
%
% NOTE: bot must be ON for following commands:
%
% -p add provocation string to bot ""
% -P remove provocation from bot
% -s list all provocations
% -T remove trigger from bot
% -b list all of bot's triggers
% -t add trigger string to bot ""
% -a set agent for some action trigger
% -A add action string to bot ""
% -f add format string to bot ""
% -v add variables for format string ""
%
% NOTE: bot must be OFF for following commands:
%
% -c create new bot with
% -d delete bot with
Figure 2
</PRE>
As can be seen in figure 2 there are commands that work when the agent is off and when the agent is on. When the agent is off one can list, create, or remove an agent. If one lists an agent he or she can use the bot_id to turn that agent on or off. Once the bot is on one can list, add, or remove a provocation, or list, add, or remove a trigger. It is the latter option that is quite complex and necessary to understand for a thorough treatment of the sub-agents.
Triggers are the mechanism through which the user can get the agent to act. When the agent observes a trigger it must take some action that the user specifies. The most basic trigger, for completeness, is the trigger/action pair. The command may look like ./godsflaw .t.you have fallen. .a0 .A.stand.., which tells the default agent that when it sees the string .you have fallen. to send the command .stand.. There is really no interesting AI application here, so we will move on to the more interesting trigger. The next trigger tells some sub-agent to capture information from the environment and fill variables with it. For example, the command
./godsflaw .t.Your health is. .a1 .f.Your health is %s. .v.5.. will tell the health agent to fill the current health variable. One can see the options to the .v and .a arguments in figure 3 by issuing the command ./godsflaw .v. and ./godsflaw .a. respectively.
<PRE>
/godsflaw -v
% - This is a string of variable IDs that
% - will be used with for format option.
% - Each ID should be seperated by a space
% - and there should be no more than 10
% - IDs. The option should look like this:
% - -v"8 6 5 3 2 1"
% - You will find ID to variable mappings
% - below. NOTE: A means the agents that
% - you would pass on the command line with
% - the -a option. You MUST not cross
% - agent IDs in the variable string. That
% - is, -v"8 9" would be BAD
% - given the table below, because 8 and 9
% - are handled by different agents. The T
% - field is the type this variable will be
% - converted to. So ID 3 below will be
% - changed to and integer number. That
% - means if you get something other than
% - a number from the format string into
% - variable 3, expect unpredictable
% - results.
% - ---------------------------------------
% - | ID | A | T |Description |
% - ---------------------------------------
% - | 1 | 1 | i | max health |
% - | 2 | 1 | i | max mana |
% - | 3 | 1 | i | max endurance |
% - | 4 | 1 | i | max willpower |
% - | 5 | 1 | i | current health |
% - | 6 | 1 | i | current mana |
% - | 7 | 1 | i | current endurance |
% - | 8 | 1 | i | current willpower |
% - | 9 | 0 | s | equilibrium |
% - | 10 | 0 | s | balance |
% - | 11 | 0 | s | prone |
% - ---------------------------------------
/godsflaw -a
% - ----------------------------------
% - You must choose an agent by number
% - This agent will handle the trigger
% - ----------------------------------
% - 0 = general agent
% - 1 = healing agent
% - 2 = movement agent
% - 3 = inventory agent
% - 4 = fighting agent
% - ----------------------------------
Figure 3
</PRE>
The trigger command above is important for filling each agent.s variables. The assumption is that every world has some generic set of commands, which should fill these variables. Once one fills the variables for each agent, that agent can perform within the environment. This leads us into the sub-agent discussion.
Before we go into the sub-agent discussion it is worth mentioning that the infrastructure seems to be behaving properly. That is, any future data structures or user configurable commands should b relatively easy to drop in place, which should leave me more time for agent development. That being said; much of my time has been spent thinking about how to build a solid infrastructure, and less on each agent.s implementation. I am half way through the healing agent.s design and testing, as will be discussed below, but I have very few ideas on how the inventory and fighting agents are going to work. The problem I face here is that, while there are parallels between the behaviors for the healing, inventory, and fighting agent, I am not so sure the reinforcement learning work I have done so far is sufficient. That is, before exploration of reinforcement learning, if I where to have taken a similar project on, I.m not so sure I would have done it any differently. I must stress that I only feel this way with regards to the healing agent, which I would like to fix in the fighting and inventory agent.
While it is the healing agent I chose to develop first, it is the movement agent, which will pull everything together. The environment contains different rooms that the agent must detect. This is the key problem the movement agent faces. As humans we can pattern match blocks of text and tell when we are in a room, but for the agent to do this is much more difficult. Say the movement agent walks into some room by issuing the command .south.. Once the agent puts the command in the actuator queue there is no guarantee that the next sensor the agent gets from the environment will be room title. That is, each state in this environment will be indexed through a hash table by the name of the room, but the agent cannot be sure of that name. After much consideration on the matter I have decided that the best way for an unsure human to check which room he or she is in would be to issue the command again. So for the agent there must be some threshold (i.e. best 3 out of 5), that tells it for sure which room it.s in. So after issuing the command X times, if Y of those times are greater than X . Y, then we know which state we are in, and can look it up in the hash table. There will be a user specified list of movement commands that will be tried in each room, and once all commands have been tried there will be some random chance for the movement agent to try dead choices over again. It also seems necessary for the movement agent to allow for different types of movement. Perhaps the agent wants to find a shop to buy health items, or the best area to battle in, or just explore the environment. The first two situations seem perfect for an A* search, while there are still some interesting considerations for how one should just blindly explore the environment. There may even be a tracking mode that the agent can be put into. These are all worthwhile areas of thought when making the movement agent, but my head is still with the health agent.
For the health agent I have already completed much of the work. In the environment if I gain or loose health the agent can detect that. If I loose health for instance, the agent evaluates if I am in danger of dieing and will heal me if needed. Each healing command belongs to one of two sets: either it can or cannot be used in battle. If the agent is in battle it will not use one of the non-battle commands. Each command has a field called gain, as can be seen above. This field records the last gain in health the command provided. This way we can get a fairly accurate measure of how much a single command can heal. This is then compared against all other commands. If this command will give the largest gain in health without capping the max-health it is chosen. The other elements of the health agent refill mana and other stats. The knowledge of loss in health or death can be used to update the state information so the agent can learn the best areas to travel and fight. This type of information will be incorporated into some heuristic that works for the type of movement the agent is currently performing. For example, the healing agent may tell the movement agent to flee battle. The movement agent should, however, be smart enough not to go into a more dangerous area while fleeing.
The inventory and fighting agent have some interesting considerations of their own. Nevertheless, I am still thinking of these agents as secondary to the primary project. That is, I am focusing my effort on the healing and movement agents so as not to overwhelm myself. If I get to these in time I would like to address the concerns above. Namely, I want a thorough and fair evaluation of the best reinforcement learning for this context. But enough of the framework and agent design, let.s discuss testing procedures.
Testing and Evaluation:
I shall address the two issues, testing and evaluation, separately. For testing there are a few approaches that should yield good results. First, TinyFugue is built with a good debugging facility. That is, when run from GDB, if one wishes to reproduce an error that may lead to a fatal runtime bug, then TinyFugue seems to catch the exception and inform the user of the GDB command to generate a stack trace. This information has been quite useful in catching those catastrophic runtime failures. In addition to this I am planning on adding, if it does not already exists, a logging facility for all incoming and outgoing data. This should allow the bot to run for extended periods of time, and if it behaves strangely in some situation, I can do a post-mortem on the logs. Perhaps the most useful ability I built in was that of running multiple bots. This way I can set up numerous bots, running from different invocations of the program, to evaluate a spectrum of their behaviors. This means I can catch some possibly poor choice in the code that may have been made too specific for the character the bot was playing at the time. I have also solicited two teenagers that live on the same island I do to test the bot. This means I will get user feedback, which should give my program a fair shake.
The second issue of evaluation should be taken care of through the agent.s ability to dump all data to the databases. That is, I can evaluate the improvements in the bots actions by looking at the data in the database. For example, I can see all the state information and how many times the bot dies, so I can record the information at the beginning of some period of time, say before I go to sleep, with that of some other period, say when I wake up. Ultimately there are many dimensions by which the bot can improve. Finding new areas of the map is one, not dieing is two, taking less time in battle is three, and choosing the optimal attacks is four. There is an extremely large set of actions that may be used as a metric for the agent.s success. Again it would seem best to segments these into smaller sections for analysis. Since the healing agent is concerned with a different function than the fighting agent it makes sense to judge their success differently. That being said, since it is not yet clear that there will be time for the fighting and inventory agent, I will focus on those areas of interest to the healing and movement agent.
The best metric for measuring the success of the healing agent and movement agent collectively is how often the agent dies. This should be awful at first but improve substantially with time. This means that the health agent is healing more effectively and the movement agent is avoiding danger areas better. There are, of course, different considerations for each agent, but this is the ultimate evaluation. Again collecting this information over time, perhaps in an automated fashion, would be the best metric.
|
|