Olivier Georgeon's research blog—also known as the story of little Ernest, the developmental agent.

Keywords: situated cognition, constructivist learning, intrinsic motivation, bottom-up self-programming, individuation, theory of enaction, developmental learning, artificial sense-making, biologically inspired cognitive architectures, agnostic agents (without ontological assumptions about the environment).

Friday, January 7, 2011

The tangential strategy learning process

To get a better view on how Ernest learned the tangential strategy, let us examine his activity trace:

1 2(> |+) 3(> |+) 4(> |+) 5(> |+) 6(> |+) 7(> |o) 8(v |*) 9(v*|o) 10(>+| ) 11(^ |*) 12(^o| ) 13(^ |o) 14(^*| ) 15(>o| ) 16(>) 17(>) 18(^*| ) 19(vo| ) 20(v) 21(^) 22(>) 23(^*| ) 24(^o|*) 25(^ |o) 26(^) 27(v) 28(v |*) 29(^ |o) 30(^) 31(>) 32(^*| ) 33(^o|*) 34(v*|o) 35(>+| ) 36(^o|*) 37(v*|o) 38(>+| ) 39(^ |*) 40(v |o) 41(>o| ) 42(>) 43(^*| ) 44(^o|*) 45(> |+) 46(> |+) 47(> |o) 48(v |*) 49(v*|o) 50(>+| ) 51(^ |*) 52(>+|+) 53(v |o) 54(vo| ) 55(v |*) 56(v*| ) 57(v |o) 58(^ |*) 59(>+|+) 60(>+|+) 61(>+|+) 62(>x|x) 63(>o|o) 64(v) 65(v) 66(v) 67[v] 68(^) 69[v] 70(v) 71(v) 72(v) 73[v] 74[>] 75(^*| ) 76(^o|*) 77(> |+) 78(> |+) 79(> |+) 80(> |+) 81(> |+) 82(> |o) 83(v |*) 84(v*|o) 85(>+| ) 86(^ |*) 87(>+|+) 88(>x|x) 89(^o|o)

In this trace, the numbers indicate the cycle counter also displayed in the bottom-right corner of the video. The symbols that represent Ernest’s primitive actions read as follows: ^ turn left, > try to move forward, v turn right. These are within parentheses when they succeed and within angle brackets when they fail. For example, Ernest turned toward an adjacent wall on step 73 and bumped a wall on step 74; in all other steps in this trace, primitive schemas succeeded.

The symbols that represent the eye signals read as follows: * appear, + closer, x arrived, o disappear. These symbols are represented on each side of a | character, the left eye signal being on the left and the right eye signal on the right. For example, on step 9, Ernest turned right, the blue square appeared in the left eye’s field and disappeared from the right eye’s field. On step 10, the blue square got closer in the left field and nothing changed in the right field. On step 11, the blue square appeared in the right field and nothing changed in the left field, meaning that the blue square was then present in both eyes’ fields.

The first interesting (safe and satisfying) sequence was found right at the beginning when Ernest moved forward and got closer in the context where he had just moved forward and gotten closer. This experience made him repeat this sequence from step 2 to step 7 when he received a disappear signal from the right eye.

From step 7 to step 11, Ernest learned the returning sequence: step 7: Move forward, disappear on right. 8 : Turn right, appear on right. 9 : Turn right, appear on left, disappear on right. 10: Move forward, closer on left. 11: Turn left, appear on right. After step 11, Ernest is facing the blue square but doesn’t yet know to move forward in this category of context, and he randomly picked a turn action.

On steps 47 through 51, Ernest enacted again the returning sequence because it had proven to work and to be satisfying in the category of context where he finds himself again. On step 52, he choose to move forward (other options had already proven uninteresting in the current category of context), obtaining a closer signal from both eyes. On step 53, however, he does not yet know to continue moving forward in the current category of context and he randomly picks turn right.

On step 59, he again got closer in both eyes when moving forward (although out of a different preceding sequence). In this context, he picked again move forward on step 60, which proved satisfying, engaging him to continue on step 61 until he stepped on the blue square on step 62.

When the second blue square is introduced on step 75, he has thus already learned to enact the different subsequences needed for the tangential strategy, as well as to categorize contexts accordingly. In effect, he uses these different subsequences in the right way until he reaches the second square on step 88.

This quick learning was somewhat lucky but we choose to report it because it led to a clean example of the tangential strategy. In other runs, Ernest may learn mixed strategies that are less prototypical. This run was, however, not so extraordinarily lucky because behaviors are not picked randomly but rather always exploit what has been learned thus far. Chance is only used to untie conflicting impulses when they cannot be untied from previous knowledge.

Experience shows that Ernest always learns a strategy within the first hundred steps, and that the most frequently found strategy is the diagonal strategy.

No comments: