Machine Learning for Control of Morphing Air Vehicles

























































- Slides: 57
Machine Learning for Control of Morphing Air Vehicles John Valasek, Amanda Lampton, Adam Niksch Aerospace Engineering Department Texas A&M University SAE Aerospace Control and Guidance Systems Committee Meeting 99 1 March 2007 Boulder, CO
Student Research Team 2006 - 2007 Valasek, Lampton, Niksch - 2
Briefing Agenda § Shape Changing / Morphing for Micro Air Vehicles § Learning § Control § Learning and Control – Adaptive-Reinforcement Learning Control § Some Results § Some Conclusions § Research Issues § Extensions Valasek, Lampton, Niksch - 3
Biomimetic Example 1 Valasek, Lampton, Niksch - 4
Biomimetic Example 2 § § § § 300 million year old design 97% success rate in capturing prey Flight relies entirely on 3 -D unsteady aerodynamics Andres Meade, Rice University Unstable in all axes Can hover as well as fly from a forward speed of +100 body length/sec (~ 60 mph) to - 3 bl/sec within 5 bl Can alter heading by 90 degs in less than 0. 1 secs Dragonflies and other flying insects have the fastest visual processing system in the animal kingdom These feats are accomplished with the neuro-circuitry of a brain smaller than a sesame seed. Valasek, Lampton, Niksch - 5
Biomimetic Example 3 Cossack Courtesy Graham Taylor, Oxford University Valasek, Lampton, Niksch - 6
Biomimetic Example 3 Courtesy Graham Taylor, Oxford University Valasek, Lampton, Niksch - 7
Biomimetic Example 3 Cossack sysid using OKID Valasek, Lampton, Niksch - 8
Biomimetic Example 3 Kinematic Engineering Model of Eagle Wing Courtesy Graham Taylor, Oxford University Valasek, Lampton, Niksch - 9
Biomimetic Flight Summary § § § Reconfigurable; shape changing (morphing); rigid or elastic body plants Multi-Input Multi-Output (MIMO)systems Distributed sensing, actuation, and propulsion – Multiple, possibly non co-located sensors – Can have large number of actuators control allocation problem – High bandwidth actuators gust alleviation Robust Adaptive Control – Flow control – Exogenous inputs (turbulence and gusts) – Damage and fault tolerance Distributed and limited processing capability – – Volumetric limit Mass limit Power limit Real-time operation Information processing system Integrated guidance, navigation, and control – Self learning, self adapting, self tuning Valasek, Lampton, Niksch - 10
Morphing Aircraft DARPA’s definition A Multi-Role Platform that: § Changes its state substantially to adapt to changing § § mission environments. Provides superior system capability not possible without reconfiguration. Uses a design that integrates innovative combinations of advanced materials, actuators, flow controllers, and mechanisms to achieve the state change. Changes On The Order Of 50% More Wing Area Or Wing Span And Chord Aerospace America, Feb 2004 Valasek, Lampton, Niksch - 11
Which Morphing? § Morphing for Mission Adaptation – Large scale, relatively slow, in-flight shape change to enable a single vehicle to perform multiple diverse mission profiles or: § Mission Adaptation Control Morphing for Control – In-flight physical or virtual shape change to achieve multiple control objectives (maneuvering, flutter suppression, load alleviation, active separation control) John Davidson, NASA Langley, AFRL Morphing Controls Workshop – Feb 2004 Valasek, Lampton, Niksch - 12
Large Morphing Air Vehicle Models Lockheed Martin Next. Gen & Barron Associates Valasek, Lampton, Niksch - 13
Small Morphing Air Vehicle Models lots of modeling, very little control Cornell (collaborator) § Garcia & Lipson – Morphing dynamical model and simulation that incorporates aerodynamic and structural effects – Validated with experimental data – Basic morphing parameters (incidence angle, dihedral angle) Maryland § Hubbard – Actuating a flapping wing structure with SMA’s – Structure, material, distribution of actuators Florida § Lind – Morphing flight demonstrator vehicle – Basic morphing parameters (dihedral angle, sweep angle) – H-inf control Valasek, Lampton, Niksch - 14
Technical Approach and Scope • Use biologically inspired approach to understand control the: • morphing (shape changing), • flight control (stability, flight path, gust tolerance, stall) • maneuvering (perching and hovering) of multi-mission micro air vehicles. • Control the physics, in concert with nonlinearities. • Not as a robustness bandaid for aerodynamic and structural uncertainties & lack of understanding • Shape change the entire vehicle, not a component on the vehicle • DARPA definition Valasek, Lampton, Niksch - 15
Big Picture Research Goals • Address: • WHEN to morph, perch, etc. • HOW to morph, perch, etc. • LEARNING to morph, perch, etc. All while keeping the (pointy? ) end facing forward. Valasek, Lampton, Niksch - 16
Valasek, Lampton, Niksch - 17
The Mathematical Domain Issue Common Mathematical Domain Helps To Avoid Ad-hoc Approaches Machine Learning Adaptive Control adapting ? ? learning Reconfiguration Policy Parameters in a Known Functional Relationship State based methods Valasek, Lampton, Niksch - 18
Adaptive-Reinforcement Learning Control (A-RLC) Conceptual Control Architecture for Reconfigurable or Morphing Aircraft SAMI ML Structured Adaptive Model Inversion (Traditional Control) Machine Learning (Intelligent Control) Flight controller to handle wide variation in dynamic properties due to shape change Learn the morphing dynamics and the optimal shape at every flight condition in real-time Valasek, Lampton, Niksch - 19
Control Architecture Reconfiguration Command Generation Control Information Distribution Environment Knowledge Base System Performance Evaluation Adaptive Controller Synthetic Jets for Virtual Shaping and Separation Control Multi. Sensor MEMS Arrays for Flow Control Feedback Sensed Information Aggregation Valasek, Lampton, Niksch - 20
Morphing Air Vehicle Evolution 2 -D Plate 2004 Rectangular Block 2005 Ellipsoid 2006 Valasek, Lampton, Niksch - 21 Delta Wing 2007 Final Objective
Morphing Air Vehicle Model - Tii. MY Shape § Ellipsoidal shape with varying axis dimensions. § Constant volume (V) during morphing § 2 independent variables: y and z, dependent dimension Morphing Dynamics § Smart material: shape memory alloy (SMA) § Morphing Dynamics : Simple Nonlinear Differential Equations § Valasek, Lampton, Niksch - 22
Morphing Dynamics Y-morphing Z-morphing Valasek, Lampton, Niksch - 23
Shape Morphing Simulation Tii. MY Valasek, Lampton, Niksch - 24
Optimal Shapes at Various Flight Conditions § Optimality is defined by identifying a cost function. J=J (Current shape, Flight condition) Valasek, Lampton, Niksch - 25
6 -DOF Mathematical Model for Dynamic Behavior § Variables § Nonlinear 6–DOF Equations – Kinematic level: – Acceleration level: § Drag Force additional dynamics due to morphing – Function of air density, square of velocity along axis, and projected area of the vehicle perpendicular to the axis Valasek, Lampton, Niksch - 26
Learning Valasek, Lampton, Niksch - 27
Knowledge Based Control § Candidates to develop inference mechanism – Rules-Based Expert System • Model the knowledge of human experts • Imitate the natural behaviour of birds Question: How Many Control Question: Many Control Theorists Does It Take Change Theorists Does It Take To. To Change AA – Machine Learning Using • Learn the optimal control mechanism by wind-tunnel experiments & flight tests Reinforcement Learning ? Artificial Neuralalgorithms Networks ? • Possible learning include Networks (ANN) , Explanation-Based Learning (EBL), and Reinforcement Learning (RL) – Biologically inspired control process • Mimic the behaviour of birds Valasek, Lampton, Niksch - 28
Simple Example Valasek, Lampton, Niksch - 29
Reinforcement Learning - 1 § Supervised or unsupervised learning? Sequential decision making. – Knowledge is based on experience and interaction with the environment, not on input-output data supplied by an external supervisor § Achieves a specific goal by learning from interactions with the environment. – Considers state information (s) – Performs sequences of actions, (a), observing the consequences – Attempts to maximize rewards (r) over time • These specify what is to be achieved, not how to achieve it – Constructs a state value function (V) • Learns an optimal control policy § Memory is contained in the state value function Valasek, Lampton, Niksch - 30
Reinforcement Learning - 2 § Learning is done repetitively, by subjecting to different scenarios § Learning is cumulative and lifelong § Formulations are generally based on Finite Markov Decision Processes (MDP) • 3 major candidate algorithms: – Dynamic programming – Monte Carlo methods – Temporal Difference Learning Valasek, Lampton, Niksch - 31
Reinforcement Learning self training 1. 2. 3. Actor takes action based upon states and preference function Critic updates state value function, and evaluates action Actor updates preference function Learning is done repetitively, by subjecting to different scenarios Valasek, Lampton, Niksch - 32
Two Illustrative Examples Valasek, Lampton, Niksch - 34
Reinforcement Learning familiar example § State: – Gain for δa = Kφ (φcmd - φ) Cessna 208 B Super Cargomaster § Actions: – Increase gain by small amount – Decrease gain by small amount § Constraints/Boundaries – Max overshoot – Rise time – Settling time § Interesting Features max os = 2% Tr = 8 s Ts = 10 s – e-greedy policy incorporated – Upward annealing of γ incorporated Matlab: ~250 sec real-time for 1000 learning episodes Valasek, Lampton, Niksch - 35
Reinforcement Learning familiar example Valasek, Lampton, Niksch - 36
Reinforcement Learning familiar example Valasek, Lampton, Niksch - 37
Smart Block Demo 1 aerial obstacle course Finish Start Valasek, Lampton, Niksch - 38
Smart Block: First Try Valasek, Lampton, Niksch - 40
Smart Block: Second Try Valasek, Lampton, Niksch - 41
Smart Block: New Course Valasek, Lampton, Niksch - 42
Adaptive–Reinforcement Learning Control §Valasek, John, Tandale, Monish D. , and Rong, Jie, "A Reinforcement Learning Adaptive Control Architecture for Morphing, ” Journal of Aerospace Computing, Information, and Communication, Volume 2, Number 4, pp. 174 -195, April 2005. §Valasek, John, Doebbler, James, Tandale, Monish D. , and Meade, Andrew J. , "Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles, ” Journal of Aerospace Computing, Information, and Communication (in review). §Tandale, Monish D. , Rong, Jie, and Valasek, John, "Preliminary Results of Adaptive- Reinforcement Learning Control for Morphing Aircraft, ” AIAA-2004 -5358, Proceedings of the AIAA Guidance, Navigation, and Control Conference, Providence, RI, 16 -19 August 2004. Valasek, Lampton, Niksch - 51
A-RLC Architecture Valasek, Lampton, Niksch - 52
Air Vehicle Example Valasek, Lampton, Niksch - 53
Air Vehicle Example § Objective – Demonstrate optimal shape morphing for multiple specified flight conditions § Method – For every flight condition, learn optimal policy that commands voltage producing the optimal shape – Minimize total cost over the entire flight trajectory – Evaluate the learning performance after 200 learning episodes RL Module is Completely Ignorant of Optimality Relations and Morphing Control Functions: It Must Learn On Its Own, From Scratch Valasek, Lampton, Niksch - 54
Timmy Demo reinforcement learning definitions § Agent: Morphing Air Vehicle Reinforcement Learning Module § Environment: Various flight conditions § Goal: Fly in optimal shape that minimizes cost § States: Flight condition; shape of vehicle § Actions: Discrete voltages applied to change shape of vehicle – Action set: § Rewards: Determined by cost functions § Optimal control policy: Mapping of the state to the voltage leading to the optimal shape Valasek, Lampton, Niksch - 55
Episodic Learning § Unsupervised learning episode § Single pass through 100 meter flight path in 200 seconds § Reference trajectory is generated arbitrarily § The flight condition changes twice during each episode § Shape change iteration after every 1 second § Exploration-exploitation dilemma: – Explorative early, exploitative later – -policy with decreasing § Limited training examples – Only 6 discrete flight conditions: § 2000 samples for KNNPI Valasek, Lampton, Niksch - 56
Demo Valasek, Lampton, Niksch - 57
Comparison of True Optimal Shape and Learned Shape KNN learns poorly for several flight conditions Valasek, Lampton, Niksch - 58
What Happened? § Function Approximation – Errors remained which could not be eliminated with additional training. – Use Galerkin-based Sequential Function Approximation (SFA) to approximate the action-value function Q(s, a) Valasek, Lampton, Niksch - 59
Comparison of True Optimal Shape and Learned Shape § New SFA approach learns optimal shape well Valasek, Lampton, Niksch - 60
Comparison Normalized RMS error Y dimension Z dimension KNN 1. 42 0. 821 SFA 1. 27 0. 661 10% reduction 20% reduction Valasek, Lampton, Niksch - 61
What Does This Show? § Reinforcement Learning successfully learns the optimal control policy that results in the optimal shape at every flight condition. – Can function in real-time, leading to better performance as system operates over the long term. § Adaptive-Reinforcement Learning Control is a promising candidate for control of Mission Morphing. – Maintains asymptotic tracking in the presence of parametric uncertainties and initial condition errors. § Shape Changes for “Mission Morphing” can be treated as piecewise constant parameter changes – SAMI is a favorable method for trajectory tracking control § “Morphing for Control” will require different control strategy – Piecewise constant approximation no longer valid Valasek, Lampton, Niksch - 68
Issues & Future Directions § Realistic structural response effects – Aeroelastic behaviour – SMA models and hysteretic behaviour • Priesach model is algebraic, only has major hysterisis loops • Solution: roll your own with R-L Valasek, Lampton, Niksch - 69
Issues & Future Directions § Time scale problem: control methodologies to handle faster shape changes – Hovakimyan’s Adaptive Control – Linear Parameter Varying (LPV) control § Novel distributed sensing and distributed actuation on a large(!) scale § Learning on a continuous domain § Modify the simulation to include a more advanced aircraft model – Insect and avian inspired sensing – Continuous versus discrete – Wing-Body, Wing-Body-Empennage, etc. § Build and fly R/C class morphing demonstrator UAV Valasek, Lampton, Niksch - 70
Issues & Future Directions R-L For Morphing Airfoils & Wings § § Incorporate aerodynamic and structural effects due to large shape changing § Degrees of Freedom – Thickness • 6% to 24% – Camber • 0% to 10% Cost Function – Potential components • Specified CL • Minimum drag • Minimum peak stress – Max camber location • 0. 2 c to 0. 8 c – Chord • 1 unit to ? units – Angle-of-attack • -5° to 10 ° – Within linear range Valasek, Lampton, Niksch - 71
Morphing Airfoil Demonstration Valasek, Lampton, Niksch - 72
Questions? John Valasek Aerospace Engineering Department Texas A&M University 3141 TAMU College Station, TX 77843 -3141 (979) 845 -1685 valasek@aero. tamu. edu § FSL Web Page – http: //flutie. tamu. edu/~fsl Valasek, Lampton, Niksch - 73