Hybrid Architecture for Cognitive Agents Oscar Romero Lpez

Hybrid Architecture for Cognitive Agents Oscar Romero López

Cognitive Architecture – Big Picture

Perception & WM Sensors Raw Data: 0. 8976, 0. 045676, 0. 15678, 0. 9987 …. Percepts: Interpreter dimension 2: observed-object-position values: right, left, in-front Dim 1 OM Val 1 Working Memory Dim 2 Val 2 p 1 p 2 p 3 Val 1 Val 2 Dim 3 Val 1 Val 2 -Time constraint (7 secs. ) - Base Level Activation (Interpreter) - Dimension-Value pairs from LT memory

Simple Decision Making Cycle Behavioural Module (Basal Ganglia) Avoid-Obstacles perp 2, perp 9, perp 10. . Look-For-Box Behaviour Modulation Perception Module perp 1, perp 3, perp 11, . . perp 3, perp 4, perp 8. . Pick-Up-Box Motor Module Store-Box Charge-Battery … How can the robot decide which behaviour must be activated?

Task Modules driven by Behaviour Networks and evolved by GEP Context-Dependent Behaviours Implemented Hybrid Behaviours Ontogenesis Task refinement through both GEP and a co-evolutionary mechanism Epigenesis High-order cognitive skills: Plan extraction , problem solving… Phylogenesis Bio-inspired Bottom-Up Approach

Behaviors… Context-Dependent Behaviours Implemented Behaviors Avoid-Obstacles Look-For-A-Storage Look-For-Battery-Charger Look-For-A-Box Look-For-B-Box Look-For-Object Pick-Up-A-Box Pick-Up-B-Box Pick-Up-Object Store-A-Box Store-B-Box Store-Object Charge-Battery - Same sensory input - Same action output - Same feedback signal pattern - Different internal/ external state

The Epigenetic Approach: (Learning and Adaptation) Cell differentiation (Biol. ). EA defines the mechanisms which allow an individual (agent) to modify some aspects of his internal/external structure as a result of the interaction with the surrounding environment. Behaviour Specialization. Production Rules: - Expert Rules (ER) - Sub-symbolic extraction rules Artificial Immune Systems Backpropagation Neural Networks

Connectionist Module Backpropagation Neural Networks -Straight BP: Supervised learning algorithm (off-line) - Reinforcement BP: Q-learning for reinforcement learning (on -line)

Connectionist Module (Charge-Battery Behavior)

Artificial Immune System Sensory Input (percepts) - Antigen Actuators Matching Antigen. Antibody Credit distribution system Antibodies Repertoire Meta-dynamics Genetic Algorithm

Artificial Immune System Iniitial Antobody condition 0 Battery charged 1 Charger is far # (B-C Behavoiur) action 0 Charger Posi: left 1 Carrying box? 0 0 Speed: Go forward 1 0 Turn Rate: Turn-right 0 Gripper St: close Evolved Antobody Meta-dynamics + Genetic Algorithm 0 1 # 0 1 0 0 Turn-left 1 Gripper open

Sub-symbolic Extraction Rules (dim 1: val 2) (dim 3: val 4) … 00 01 10 11 … SER 1011001… (SER) 0011011… NN AIS extract generalize specialize Rule n: (dim 1: val 1, val 2) (dim 3: val 3) val 3, val 4, … val 6) … Rule n 1: (dim 1: val 1) val 2) (dim 3: val 3)… val 4)… Rule n 2: (dim 1: val 2) val 1) (dim 3: val 4)… val 3)… Rule n 3: (dim 1: val 1) (dim 3: val 3)… … if IG(C, all) Rule< n 4: threshold 2 (dim 1: val 2) (dim 3: val 6)… … if IG(C, all) > threshold 1

Integrating the action recommendations Behavior n ER SER AIS NN Action recommendations w. ER Act 1 0 1 1 BLA 0. 56 0. 98 0. 1 0. 4 w. SER … 1 1 0 0 1 1 0 w. AIS … 0 … w. NN 1 … p: x: i, j: τ: U: probability curent state matching rules (j is a range) temperature Boltzmann distribution Integrated action 0111…

Ontogenetic Approach: (Development) This approach permits the development of a given functionality from the information stored in the agent’s genome. TASK Goal Behavior Network Focus Manager Precondition List: - Precond 1 - Precond 2 - Precond 3 -… Add List: - state 1 - state 2 -… Del List: - state 1 - state 2 -… Activation Link Inhibition Link

Ontogenetic Approach: (Development)

Ontogenetic Approach: (Development) Task 1: Box-Collecting Task 2: Box-Piling-Up A A C C π θ φ δ γ 1 0. 45 0. 25 0. 3 - Modify Parameters - Create new Behaviours - Modify existing Links B B C A B B C π θ φ δ γ 0. 8 0. 6 0. 45 0. 8 0. 7 A

Ontogenetic Approach using GEP BN 1 Goal 1 Params: a 1, b 1, . . BN 2 BN n Goal 2 Params: a 2, b 2, . . Goal 3 Params: a 3, b 3, . . B 1 B 3 B 5 B 1 B 2 . . . G 5 B 6 B 2 ADF: Gene G 2 … ADF: Gene Gn G 1 facilitates G 2 limits B 5 B 3 ADF: Gene G 1 enables B 1 B 6 Hierarchical Task Network G 3 Homeotic Gene HG 1 Chromosome 1 … ADF: Gk+1 … ADF: Gkn Chromosomen G 3 HGk G 4

Ontogenetic Approach using GEP Plan n Task 1 Goal. 1 … Beh. 3 params Beh. 5 addlist … Task 2 … Goal 2 Beh. 6 params AND Beh. 1 HTN addlist … AND Beh. 4 Beh. 2 Task enables 1 Beh. 3 Task 1 … Task 3

Ontogenetic Approach - Fitness Purpose: combining multiple fitness functions in some way so as to produce an aggregate scalar fitness function. R: selection range is used as a limit for selection to operate, (100) P(ij): the value predicted by the individual program i for fitness case j (out of n fitness cases) - neg. feedback: number of no well-formed structures - neg. feedback: num of contradictory links (add. List vs delete. List) - pos. feedback: inverse num of activation cycles to activate the next behavior at the current task - pos. feedback: inverse num of steps for a BN to achieve a goal (1 / steps) Tj: the target value for fitness case j (precision of 0. 01)

Ontogenetic Approach - Flow • Generate pseudo-random population of chromosomes (plan) • For each chromosome i at iteration j • Calculate the fitness function • Validate if the chromosomes fits the current and past goals. • Integrate the behavior activation (Borda vouting method) • Apply local genetic operators: selection and replication, Mutation, Gene Transposition, and Gene Recombination

Phlylogenetic Approach: Co-evolution Behavior Co-evolution Agent 1 Agent 2 B 3 B 2 B 1 B 3 Behavior Repository B 3 B 2 Behavior Repository B 2 B 1 Behavior Repository B 1 B 3 Memetic Algorithm B 2 Memetic Algorithm B 1 Memetic Algorithm Agent 3

Q-learning Updating of Q(x, a) γ: ai : ri : e(y): x, y: discount factor that favors reinforcement sooner relative to that received later an action that can be performed at step i (with a 0 = a) is the reinforcement received at step i (positive, negative, or zero) max Q(x, b) sensory input (internal and external), wm items, current goal

Comparisons

Straight BP Neural Network SENSOR Sensor 0 BITS 1 Sensor 1 1 Sensor 2 1 Sensor 3 1 Sensor 4 1 Sensor 5 1 Sensor 6 1 Sensor 7 2 Sensor 8 Sensor 9 2 2 ACCION Accion 0 Accion 1 2 3 Accion 2 1 DESC Esta la caja en el rango de observacion detectado por el fiducial? no/si Distancia de aproximacion? no/si (app. Distance) Esta la caja a la izq o a la derecha? (signo del angulo beta) Alinearse por la izquierda o por la derecha? (umb. Ang. Align) Angulo de alineacion con el eje (vert/horiz) menor que umbral? no/si (umb. Ang. Align) Distancia es menor que la distancia minima de avance hacia adelante? (umb. Pos. Perpndl, umb. Pos. Frente) Robot esta orientado perpendicular/ con la caja? (enfrente) Angulo de alineacion menor que umb. Ang. Alineacion. Final (0=menor que umbral, 1=ang. Negativo, 2=ang. Positivo, 3=totalmente alineado) Numero de beams (0 =0, . . . 3=3) Estado del gripper (0=open, 1=closed, 2=moving) speed, avoidspeed, aproxspeed noturn, turnleft, turnright, alignleft, alignright opengripper, closegripper Patterns Sensory Input Output //Approaching 1 -0 -0 -0 -00 -00 -0 1 -0 -0 -0 -0 -00 -00 -0 0 -3 -0 0 -4 -0 //Aligning with axis (vert/horizon) 1 -1 -0 -0 -00 -00 -0 1 -1 -0 -0 -0 -00 -00 -0 1 -3 -0 1 -4 -0 //Go straightforward 1 -1 -0 -0 -1 -1 -0 -00 -00 -0 1 -1 -0 -0 -00 -00 -0 1 -0 -0 2 -0 -0 //Rotate 1 -1 -0 -0 -00 -00 -0 1 -1 -1 -0 -0 -00 -00 -0 1 -3 -0 1 -4 -0 //Correct position 1 -1 -0 -0 -1 -00 -00 -0 1 -1 -0 -0 -1 -01 -00 -0 1 -1 -0 -0 -1 -10 -00 -0 1 -3 -0 1 -4 -0 //Grasp object 1 -1 -0 -0 -1 -11 -10 -0 1 -1 -0 -0 -0 -0 -1 -11 -01 -0 1 -0 -1 1 -0 -0

AIS Mathematical Model Strenght: Bid: Noise: Impuesto 1 Impuesto 2 Si (t + 1) = Si (t) – Bi (t) – Ti (t) + Ri (t). Bi(t) = Capu * Si(t) + (k 1 * Bid. Radio. BRPow) AEi = Bi + N( apu) Cimpuesto = 1 – (1/2) (1 / n) Ti = Cimpuesto Si Final strenght S(t+1) = S(t) – Capu S(t) - Cimpuesto S(t) + R(t) – Impapu Capu S(t)

Parameters

Reinforcement Functions Obstacles Avoidance Behaviour: Pick up object Behaviour: In case of collision: r = -1 In case of achiving the goal: r=1 Otherwise r = 0. 3 ∙ ∑ (dc - dp) + 0. 7 ∙ ∑ (dcc - dcp) Otherwise r = 0. 5 ∙ ∑ 1/℮(d 1 Where, d: distance between the object and the robot dc: distance of collision (sensors in the front) between the object and the robot c: current value p: prior value Where, d: distance between the object and the robot a: angle between the object and the robot - d 2) + 0. 5 ∙ ∑ 1/ ℮(a 1 - a 2)

AIS architecture

BN Mathematical Model Behavior Definition Predeccesor Link (precondition) Succesor Link (add list) Conflicter Link (delete list)

BN Mathematical Model

Hierarchical Task Networks

Procedural Integration Nivel Cognitivista Regla Asociativa Chunk Mesa: [patas, 4] [color, marrón] [forma, cuadrada] Nivel Emergente Chunk Silla: [patas, 4] [color, negro] [material, metal] Tiene-un 4 patas … 4 … patas marrón color Estado del Mundo … … … negro mesa color cuadrada forma … … silla metal … Memoria Asociativa material … … Memoria Asociativa

Ontogenetic Approach: (Development) Collect. A-Boxes π θ φ δ γ Store A-Box Storg. Id, No. Obst Box. Grasp, the mean level of activation the threshold for becoming active the amount of energy for propositions the amount of energy for goals the amount of energy for protected goals With. Batt, Pick. Up A-Box Storg. Id, With. Batt, Box. Id, No. Obst Charge Battery Look. For A-Box Box. No. Id, Storg. Id, No. Battery, With. Batt, Look. For A-Storage No. Storg. Id, No. Obst Charger. Id, No. Obst, Avoid Obstacles Look. For Battery Charger No. Charg. Id, No. Obst

a) Behavior Network topology b) Expression Tree for the Behavior Network BN Pc 1 AND B 1 M 2 M 1 B 3 B 2 Pc 1 AND B 4 AND B 2 M 3 B 1 AND M 4 B 5 M 5 AND B 4 B 2 B 5 B 2 B 3 c) Chromosomal Encoding 0 1 2 BN AND 3 4 5 6 7 8 9 10 M 1 M 2 M 3 AND Pc 1 AND B 2 B 1 M 4 Head domain 11 12 13 14 15 16 17 18 19 20 21 22 23 24 M 5 B 4 B 5 AND B 2 B 4 B 3 B 2 0 Tail domain 4 1 7 Params domain Global Parameters: Dp = {0. 345, 0. 567, 0. 123, 0. 987, 0. 345, 0. 889, 0. 765, 0. 01, 0. 234, 0. 543}