Toward Grounding Knowledge in Prediction or Toward a

  • Slides: 23
Download presentation
Toward Grounding Knowledge in Prediction or Toward a Computational Theory of Artificial Intelligence Rich

Toward Grounding Knowledge in Prediction or Toward a Computational Theory of Artificial Intelligence Rich Sutton AT&T Labs with thanks to Satinder Singh and Doina Precup

It’s Hard to Build Large AI Systems • Brittleness • Unforeseen interactions • Scaling

It’s Hard to Build Large AI Systems • Brittleness • Unforeseen interactions • Scaling • Requires too much manual complexity management – people must understand, intervene, patch and tune – like programming • Need more autonomy – learning, verification – internal coherence of knowledge and experience

Marr’s Three Levels of Understanding • Marr proposed three levels at which any information

Marr’s Three Levels of Understanding • Marr proposed three levels at which any information -processing machine must be understood – Computational Theory Level • What is computed and why – Representation and Algorithm Level – Hardware Implementation Level • We have little computational theory for Intelligence – Many methods for knowledge representation, but no theory of knowledge – No clear problem definition – Logic

Reinforcement Learning provides a little Computational Theory • Policies (controllers) : States Pr(Actions) •

Reinforcement Learning provides a little Computational Theory • Policies (controllers) : States Pr(Actions) • Value Functions • 1 -Step Models

Outline of Talk • Experience • Knowledge Prediction • Macro-Predictions • Mental Simulation offering

Outline of Talk • Experience • Knowledge Prediction • Macro-Predictions • Mental Simulation offering a coherent candidate computational theory of intelligence

Experience • AI agent should be embedded in an ongoing interaction with a world

Experience • AI agent should be embedded in an ongoing interaction with a world Agent actions observations World Experience = these 2 time series • Enables clear definition of the AI problem – Let {reward } be function of {observation } t t – Choose actions to maximize total reward cf. textbook definitions • Experience provides something for knowledge to be about

What is Knowledge? Deny the physical world Deny existence of objects, people, space… Deny

What is Knowledge? Deny the physical world Deny existence of objects, people, space… Deny all non-answers, correspondence theories All we really know about is our experience Knowledge must be in terms of experience

Grounded Knowledge A is always followed by B if = A then A, B

Grounded Knowledge A is always followed by B if = A then A, B observations =B if A( ) then B( ) A, B predicates Action conditioning: if A( ) and C( ) then B( All of these are predictions )

World Knowledge Predictions • The world is a black box, known only by its

World Knowledge Predictions • The world is a black box, known only by its I/O behavior (observations in response to actions) • Therefore, all meaningful statements about the world are statements about the observations it generates • The only observations worth talking about are future ones Therefore: The only meaningful things to say about the world are predictions

Non-predictive “Knowledge” • Mathematical knowledge, theorems and proofs – always true, but tell us

Non-predictive “Knowledge” • Mathematical knowledge, theorems and proofs – always true, but tell us nothing about the world – not world knowledge • Uninterpretted signals, e. g. , useful representations – real and useful, but not by themselves world knowledge, only an aid to acquiring it • Knowledge of the past • Policies – could be viewed as predictions of value – but by themselves are more like uninterpretted signals Predictions capture “regular”, descriptive world knowledge

Grounded Knowledge A is always followed by B if 1 -step preds. = A

Grounded Knowledge A is always followed by B if 1 -step preds. = A then A, B observations =B if A( ) then B( ) A, B predicates Action conditioning: if A( ) and C( ) then B( ) Still a pretty limited kind of knowledge. Can’t say anything beyond one step!

Grounded Knowledge A is always followed by B if 1 -step preds. = A

Grounded Knowledge A is always followed by B if 1 -step preds. = A then A, B observations =B if A( ) then B( ) A, B predicates Action conditioning: if A( ) and C( ) then B( ) steps later many steps long if A( ) and <arbitrary experiment> then many B(<outcome>) macropred. prior grounding posterior grounding

Both Prior and Posterior Grounding are Needed • “Classical” AI systems omit prior grounding

Both Prior and Posterior Grounding are Needed • “Classical” AI systems omit prior grounding – e. g. , “Tweety is a bird”, “John loves Mary” – sometimes called the “symbol grounding problem” • Modern AI sytems tend to skimp the posterior – supervised learning, Bayes nets, robotics… • It is not OK to leave posterior grounding to external, human observers – the information is just not in the machine – we don’t understand it; we haven’t done our job! • Yet this is such an appealing shortcut that we have almost always done it

Outline of Talk • Experience • Knowledge Prediction • Macro-Predictions • Mental Simulation offering

Outline of Talk • Experience • Knowledge Prediction • Macro-Predictions • Mental Simulation offering a coherent candidate computational theory of intelligence

Macro-Predictions (Options) a la Sutton, Precup & Singh, 1999 et al. Let : States

Macro-Predictions (Options) a la Sutton, Precup & Singh, 1999 et al. Let : States Pr(Actions) be an arbitrary policy Let b : States Pr({0, 1}) be a termination condition Then < , b> is a kind of experiment – do until b=1 – measure something about the resulting experience Suppose we measure the outcome: – the state at the end of the experiment – the total reward during the experiment Then the macro-prediction for < , b> would predict Pr(end-state), E{total reward} given start-state This is a very general, expressive form of prediction

Sutton, Precup, & Singh, 1999 Rooms Example Policy of one option:

Sutton, Precup, & Singh, 1999 Rooms Example Policy of one option:

Planning with Macro-Predictions

Planning with Macro-Predictions

Learning Path-to-Goal with and without Hallway Macros (Options)

Learning Path-to-Goal with and without Hallway Macros (Options)

Mental Simulation • Knowledge can be gained from experience – by actually performing experiments

Mental Simulation • Knowledge can be gained from experience – by actually performing experiments • But knowledge can also be gained without overt experience – we call this thinking, reasoning, planning, cognition… • This can be done through “thought experiments” – internal simulation of experience – generated from predictive knowledge – subject to learning methods as before • Much thought can be achieved this way. . .

Illustration: Dynamic Mission Planning for UAVs Reward=25 • 15 8 • ? • •

Illustration: Dynamic Mission Planning for UAVs Reward=25 • 15 8 • ? • • 5 10 • Base Expected Reward/ Mission 60 • 50 • 40 High Fuel 30 RL planning w/strategies and real-time control RL planning w/strategies Static Replanner Low Fuel Mission: Fly over (observe) most valuable sites and return to base Stochastic weather affects observability (cloudy or clear) of sites Limited fuel Intractable with classical optimal control methods Temporal scales: – Tactics: which way to fly now – Strategies: which site to head for Strategies compress space and time – – Reduce no. states from ~1011 to ~106 Reduce tour length from ~600 to ~6 Reinforcement Learning with strategies and real-time control outperforms optimal tour planner that assumes static weather Barto, Sutton, and Moll, Adaptive Networks Laboratory, University of Massachusetts

What to compute and Why Reward Policy Value Functions The ultimate goal is reward,

What to compute and Why Reward Policy Value Functions The ultimate goal is reward, but our AI spends most of its time with knowledge Knowledge/ Predictions

A Candidate Computational Theory of Artificial Intelligence • AI Agent should be focused on

A Candidate Computational Theory of Artificial Intelligence • AI Agent should be focused on finding general macro -predictions of experience • Especially seeking predictions that enable rapid computation of values and optimal actions • Predictions and their associated experiments are the coin of the realm – they have a clear semantics, can be tested & learned – can be combined to produce other predictions, e. g. values • Mental Simulation (plus learning) – makes new predictions from old – start of a computational theory of knowledge use

Conclusions • World knowledge must be expressed in terms of the data • Such

Conclusions • World knowledge must be expressed in terms of the data • Such posterior grounding is challenging, – lose expressiveness in the short term – lose external (human) coherence, explainability • But can be done step by step, • And brings palpable benefits – autonomous learning/verification/extension of knowledge – autonomous complexity management due to internal coherence – knowledge suited to general reasoning process – mental simulation • We must provide this grounding!