Olivier Georgeon's research blog—also known as the story of little Ernest, the developmental agent.

Keywords: situated cognition, constructivist learning, intrinsic motivation, bottom-up self-programming, individuation, theory of enaction, developmental learning, artificial sense-making, biologically inspired cognitive architectures, agnostic agents (without ontological assumptions about the environment).

Tuesday, October 21, 2008

Remarks on Ernest's Soar implementation

There are several remarks worth noticing about Ernest's implementation in Soar:

- I do not use Soar's input and output functions. My Soar model implements both Ernest and his environment. From the Soar's viewpoint, this model does not interact with any outside environment, it evolves by itself. It is only us, as observers, who understand it as an agent interacting with an environment.

- Ernest's memory does not match the classical Soar memory definition. From the Ernest's design viewpoint, Ernest stores schemas in his long-term memory, and the current situation in his short-term memory. From the Soar viewpoint, however, these schemas and this situation are actually stored in what the Soar vocabulary calls working memory, usually considered as declarative and semantic. Thus, Soar modelers could think that Ernest learns semantic knowledge, but that would be cheeky because, from Ernest's viewpoint, this knowledge has no semantics, it is only behavioral patterns and thus should be called procedural knowledge.

- I do not describe Ernest's action possibilities as operators, contrary to what Soar expects. Instead, I describe them as schemas, and my model only uses one Soar operator to generate the appropriate action that results from the evaluation of schemas.

- I cannot use Soar's built-in reinforcement learning mechanism for two reasons. The first is that it only applies to operators, and I do not need to reinforce operators but schemas. The second is that Soar reinforcement learning is designed to let the modeler define rewards from the environment. From these rewards, Soar computes operator preferences through an algorithm on which I have insufficient control. In my case, Ernest's behavior is not driven by rewards sent to him as inputs, but by internal preferences for certain types of schemas. Therefore, Soar reward mechanism does not help me, and I have to implement my own reinforcement mechanism that just increases a schema's weight each time it is enacted.

- So far, I do not use the Soar impasse mechanism. When Ernest does not have knowledge to choose between two or more schemas, he just randomly picks one.

- I do not use the Soar's default probabilistic action selection mechanism. The idea that there should be an epsilon probability that Ernest choose not his preferred action is just absurd. It only impedes the exploration and learning process. I force the epsilon value to zero.

In conclusion, it is clear that my usage of Soar does not correspond to what Soar has been created for. Soar has been created for representing the modeler's knowledge but not for developing agents who construct their own knowledge from their activity. Nevertheless, so far, Soar has proven to offer enough flexibility to be usable for my approach. It provides me with powerful and efficient graph manipulation facilities that are essential for Ernest.

No comments: