Cog Sci C 131Psych C 123 Computational Models

  • Slides: 69
Download presentation
Cog. Sci C 131/Psych C 123 Computational Models of Cognition Tom Griffiths

Cog. Sci C 131/Psych C 123 Computational Models of Cognition Tom Griffiths

Cog. Sci C 131/Psych C 123 Computational Models of Cognition Tom Griffiths

Cog. Sci C 131/Psych C 123 Computational Models of Cognition Tom Griffiths

Computation Cognition

Computation Cognition

Cognitive science • The study of intelligent systems • Cognition as information processing input

Cognitive science • The study of intelligent systems • Cognition as information processing input output

Computational modeling Look for principles that characterize both computation and cognition computation input computation

Computational modeling Look for principles that characterize both computation and cognition computation input computation output input output

Two goals • Cognition: – explain human cognition (and behavior) in terms of the

Two goals • Cognition: – explain human cognition (and behavior) in terms of the underlying computation • Computation: – gain insight into how to solve some challenging computational problems

Computational problems • Easy: – arithmetic, algebra, chess • Difficult: – learning and using

Computational problems • Easy: – arithmetic, algebra, chess • Difficult: – learning and using language – sophisticated senses: vision, hearing – similarity and categorization – representing the structure of the world – scientific investigation human cognition sets the standard

Three approaches Rules and symbols Networks, features, and spaces Probability and statistics

Three approaches Rules and symbols Networks, features, and spaces Probability and statistics

Three approaches Rules and symbols Networks, features, and spaces Probability and statistics

Three approaches Rules and symbols Networks, features, and spaces Probability and statistics

Logic All As are Bs All Bs are Cs All As are Cs Aristotle

Logic All As are Bs All Bs are Cs All As are Cs Aristotle (384 -322 BC)

Mechanical reasoning (1232 -1315)

Mechanical reasoning (1232 -1315)

The mathematics of reason Thomas Hobbes (1588 -1679) Rene Descartes (1596 -1650) Gottfried Leibniz

The mathematics of reason Thomas Hobbes (1588 -1679) Rene Descartes (1596 -1650) Gottfried Leibniz (1646 -1716)

Modern logic P Q George Boole (1816 -1854) Friedrich Frege (1848 -1925)

Modern logic P Q George Boole (1816 -1854) Friedrich Frege (1848 -1925)

Computation Alan Turing (1912 -1954)

Computation Alan Turing (1912 -1954)

Logic Inference Rules Facts P Q Q P The World

Logic Inference Rules Facts P Q Q P The World

Categorization

Categorization

Categorization cat small furry domestic carnivore

Categorization cat small furry domestic carnivore

Logic Inference Rules Facts P Q Q P The World

Logic Inference Rules Facts P Q Q P The World

Early AI systems… Operations Workspace Facts Operations Goals Actions Observations The World

Early AI systems… Operations Workspace Facts Operations Goals Actions Observations The World

Rules and symbols • Perhaps we can consider thought a set of rules, applied

Rules and symbols • Perhaps we can consider thought a set of rules, applied to symbols… – generating infinite possibilities with finite means – characterizing cognition as a “formal system” • This idea was applied to: – deductive reasoning (logic) – language (generative grammar) – problem solving and action (production systems)

Language as a formal system Noam Chomsky

Language as a formal system Noam Chomsky

Language “a set (finite or infinite) of sentences, each finite in length and constructed

Language “a set (finite or infinite) of sentences, each finite in length and constructed out of a finite set of elements” L This is a good sentence Sentence bad this is 1 0 all sequences linguistic analysis aims to separate the grammatical sequences which are sentences of L from the ungrammatical sequences which are not

A context free grammar S NP VP T N V NP VP TN V

A context free grammar S NP VP T N V NP VP TN V NP the man, ball, … hit, took, … S NP T the VP N V man hit NP T N the ball

Rules and symbols • Perhaps we can consider thought a set of rules, applied

Rules and symbols • Perhaps we can consider thought a set of rules, applied to symbols… – generating infinite possibilities with finite means – characterizing cognition as a “formal system” • This idea was applied to: – deductive reasoning (logic) – language (generative grammar) – problem solving and action (production systems) • Big question: what are the rules of cognition?

Computational problems • Easy: – arithmetic, algebra, chess • Difficult: – learning and using

Computational problems • Easy: – arithmetic, algebra, chess • Difficult: – learning and using language – sophisticated senses: vision, hearing – similarity and categorization – representing the structure of the world – scientific investigation human cognition sets the standard

Inductive problems • Drawing conclusions that are not fully justified by the available data

Inductive problems • Drawing conclusions that are not fully justified by the available data – e. g. detective work “In solving a problem of this sort, the grand thing is to be able to reason backward. That is a very useful accomplishment, and a very easy one, but people do not practice it much. ” • Much more challenging than deduction!

Challenges for symbolic approaches • Learning systems of rules and symbols is hard! –

Challenges for symbolic approaches • Learning systems of rules and symbols is hard! – some people who think of human cognition in these terms end up arguing against learning…

The poverty of the stimulus • The rules and principles that constitute the mature

The poverty of the stimulus • The rules and principles that constitute the mature system of knowledge of language are actually very complicated • There isn’t enough evidence to identify these principles in the data available to children Therefore • Acquisition of these rules and principles must be a consequence of the genetically determined structure of the language faculty

The poverty of the stimulus Learning language requires strong constraints on the set of

The poverty of the stimulus Learning language requires strong constraints on the set of possible languages These constraints are “Universal Grammar”

Challenges for symbolic approaches • Learning systems of rules and symbols is hard! –

Challenges for symbolic approaches • Learning systems of rules and symbols is hard! – some people who think of human cognition in these terms end up arguing against learning… • Many human concepts have fuzzy boundaries – notions of similarity and typicality are hard to reconcile with binary rules • Solving inductive problems requires dealing with uncertainty and partial knowledge

Three approaches Rules and symbols Networks, features, and spaces Probability and statistics

Three approaches Rules and symbols Networks, features, and spaces Probability and statistics

Similarity What determines similarity?

Similarity What determines similarity?

Representations What kind of representations are used by the human mind?

Representations What kind of representations are used by the human mind?

Representations How can we capture the meaning of words? Semantic networks Semantic spaces

Representations How can we capture the meaning of words? Semantic networks Semantic spaces

Categorization

Categorization

Computing with spaces +1 = cat, -1 = dog y error: cat dog y

Computing with spaces +1 = cat, -1 = dog y error: cat dog y x 2 x 1 x 2 perceptual features x 1

Networks, features, and spaces • Artificial neural networks can represent any continuous function…

Networks, features, and spaces • Artificial neural networks can represent any continuous function…

Problems with simple networks x 2 y x 1 x 2 Some kinds of

Problems with simple networks x 2 y x 1 x 2 Some kinds of data are not linearly separable OR AND x 2 XOR x 2 x 1 x 1

A solution: multiple layers output layer y y z 2 hidden layer z 1

A solution: multiple layers output layer y y z 2 hidden layer z 1 z 2 z 1 x 2 input layer x 1 x 2 x 1

Networks, features, and spaces • Artificial neural networks can represent any continuous function… •

Networks, features, and spaces • Artificial neural networks can represent any continuous function… • Simple algorithms for learning from data – fuzzy boundaries – effects of typicality

General-purpose learning mechanisms E (error) ( is learning rate) wij

General-purpose learning mechanisms E (error) ( is learning rate) wij

The Delta Rule +1 = cat, -1 = dog y for any function g

The Delta Rule +1 = cat, -1 = dog y for any function g with derivative g x 1 x 2 perceptual features output error influence of input

Networks, features, and spaces • Artificial neural networks can represent any continuous function… •

Networks, features, and spaces • Artificial neural networks can represent any continuous function… • Simple algorithms for learning from data – fuzzy boundaries – effects of typicality • A way to explain how people could learn things that look like rules and symbols…

Simple recurrent networks output layer x 1 hidden layer z 1 input layer x

Simple recurrent networks output layer x 1 hidden layer z 1 input layer x 2 copy x 1 z 2 x 2 context units input (Elman, 1990)

Hidden unit activations after 6 iterations of 27, 500 � words (Elman, 1990)

Hidden unit activations after 6 iterations of 27, 500 � words (Elman, 1990)

Networks, features, and spaces • Artificial neural networks can represent any continuous function… •

Networks, features, and spaces • Artificial neural networks can represent any continuous function… • Simple algorithms for learning from data – fuzzy boundaries – effects of typicality • A way to explain how people could learn things that look like rules and symbols… • Big question: how much of cognition can be explained by the input data?

Challenges for neural networks • Being able to learn anything can make it harder

Challenges for neural networks • Being able to learn anything can make it harder to learn specific things – this is the “bias-variance tradeoff”

Bias-variance tradeoff

Bias-variance tradeoff

Bias-variance tradeoff

Bias-variance tradeoff

Bias-variance tradeoff

Bias-variance tradeoff

Bias-variance tradeoff

Bias-variance tradeoff

What about generalization?

What about generalization?

What happened? • The set of 8 th degree polynomials contains almost all functions

What happened? • The set of 8 th degree polynomials contains almost all functions through 10 points • Our data are some true function, plus noise • Fitting the noise gives us the wrong function • This is called overfitting – while it has low bias, this class of functions results in an algorithm that has high variance (i. e. is strongly affected by the observed data)

The moral • General purpose learning mechanisms do not work well with small amounts

The moral • General purpose learning mechanisms do not work well with small amounts of data (the most flexible algorithm isn’t always the best) • To make good predictions from small amounts of data, you need algorithms with bias that matches the problem being solved • This suggests a different approach to studying induction… – (what people learn as n 0, rather than n )

Challenges for neural networks • Being able to learn anything can make it harder

Challenges for neural networks • Being able to learn anything can make it harder to learn specific things – this is the “bias-variance tradeoff” • Neural networks allow us to encode constraints on learning in terms of neurons, weights, and architecture, but is this always the right language?

Three approaches Rules and symbols Networks, features, and spaces Probability and statistics

Three approaches Rules and symbols Networks, features, and spaces Probability and statistics

Probability Gerolamo Cardano (1501 -1576)

Probability Gerolamo Cardano (1501 -1576)

Probability Thomas Bayes (1701 -1763) Pierre-Simon Laplace (1749 -1827)

Probability Thomas Bayes (1701 -1763) Pierre-Simon Laplace (1749 -1827)

Bayes’ rule How rational agents should update their beliefs in the light of data

Bayes’ rule How rational agents should update their beliefs in the light of data Posterior probability h: hypothesis d: data Likelihood Prior probability Sum over space of hypotheses

Cognition as statistical inference • Bayes’ theorem tells us how to combine prior knowledge

Cognition as statistical inference • Bayes’ theorem tells us how to combine prior knowledge with data – a different language for describing the constraints on human inductive inference

Prior over functions k = 8, = 5, = 1 k = 8, =

Prior over functions k = 8, = 5, = 1 k = 8, = 5, = 0. 3 k = 8, = 5, = 0. 1 k = 8, = 5, = 0. 01

Maximum a posteriori (MAP) estimation

Maximum a posteriori (MAP) estimation

Cognition as statistical inference • Bayes’ theorem tells us how to combine prior knowledge

Cognition as statistical inference • Bayes’ theorem tells us how to combine prior knowledge with data – a different language for describing the constraints on human inductive inference • Probabilistic approaches also help to describe learning

Probabilistic context free grammars S NP NP VP T T N N V V

Probabilistic context free grammars S NP NP VP T T N N V V NP VP TN N V NP the a man ball hit took 1. 0 0. 7 0. 3 1. 0 0. 8 0. 2 0. 5 0. 6 0. 4 S 1. 0 NP VP 0. 7 T 0. 8 the N 0. 5 1. 0 V NP 0. 7 0. 6 man hit T N the ball 0. 8 P(tree) = 1. 0 0. 7 1. 0 0. 8 0. 5 0. 6 0. 7 0. 8 0. 5

Probability and learnability • Any probabilistic context free grammar can be learned from a

Probability and learnability • Any probabilistic context free grammar can be learned from a sample from that grammar as the sample size becomes infinite • Priors trade off with the amount of data that needs to be seen to believe a hypothesis

Cognition as statistical inference • Bayes’ theorem tells us how to combine prior knowledge

Cognition as statistical inference • Bayes’ theorem tells us how to combine prior knowledge with data – a language for describing the constraints on human inductive inference • Probabilistic approaches also help to describe learning • Big question: what do the constraints on human inductive inference look like?

Challenges for probabilistic approaches • Computing probabilities is hard… how could brains possibly do

Challenges for probabilistic approaches • Computing probabilities is hard… how could brains possibly do that? • How well do the “rational” solutions from probability theory describe how people think in everyday life?

Three approaches Rules and symbols Networks, features, and spaces Probability and statistics

Three approaches Rules and symbols Networks, features, and spaces Probability and statistics