Balancing an Inverted Pendulum with a MultiLayer Perceptron

Balancing an Inverted Pendulum with a Multi-Layer Perceptron ECE 539 Final Project Spring 2000 Chad Seys

Outline • • • The Inverted Pendulum The Problem Approach Position Representation Output Force Representation Initialization Convergence & Reinitialization Results Discussion

The Inverted Pendulum: • Abstraction is a rigid rod attached at its lower end to a pivot point. • Like balancing a broom on the palm of hand. • Useful in modeling: – Launching a rocket into space – look up another

The Problem: • Train a multi-layer perceptron to. . . – keep an inverted pendulum in its upright position – move an inverted pendulum from any position to the upright position (keep it balanced there).

Approach: • Divide the 180 degrees into M arc segments (where M is odd). – M odd to provide a central region where no force is applied. – There will be M input neurons, one per segment. • There will be two output neurons whose outputs will be interpreted as opposing force vectors of fixed magnitude.

Inverse Pendulum Position Representation • A few of the possibilities to explore: – (Chosen) A “ 1” in the input dimension corresponding to the arc segment which the inverse pendulum currently occupies, “ 0” in other dimensions. – As above, but have a gradual decline to “ 0” in neighboring segments. • Might help prevent overshoot at the top. – Alternatively, put “ 0” to the left of inv pendulum, “ 0. 5” at the inv pendulum, and “ 1” to the right of the inv pendulum. • Might provide more directional information.

Output Force Representation • The output neuron force vector will act perpendicularly to the center of mass of the inv pendulum. • Will use a supervised learning paradigm. – Training data will be a fixed correcting force to return the inverse pendulum to the vertical. • Ideally would use a unsupervised learning paradigm allowing varying correcting force magnitudes, but unsure how to implement.

Initialization • at top with a small movement in one or the other direction • at increasing angles from the top with no movement. (not included in final version of project)

Convergence & Reinitialization • The standard: Amount of match between output and the teacher’s data. • Also, over how many simulation steps does the inv pendulum stay within a small number of degrees of the top. Stability. – This may be the criteria for reinitialization. – May not reset the network weights, only the inverse pendulum position. – (did not appear in the final version of project)

1 1 1 H Hidden Neurons M Input Neurons H 1 M Arc Segments Fixed Output Force 2 Output Neurons

Results (Force vs. Time Step): • Difficult to find a balance of force and sampling interval. – Using too large of a force would result in overcorrection.

Results (Force vs. Time Step): – Too small of a force resulted in under correction. – Smaller time steps solve this problem, but increase memory usage and processing time.

Did not reach 100% convergence. – Ran one promising (which appeared not to be under or over corrected) simulation for a period of several days (>69000 iterations) and achieved a convergence rate of only 61. 3%. – By the way the pendulum falls during the testing section of the simulation, the neural network does not yet appear to have “learned” to balance the inverse pendulum.

Results • Did not succeed in balancing a inverse pendulum during the duration of the simulation runs.