Chapters 5 and 7 Operant Learning Operant Instrumental

  • Slides: 74
Download presentation
Chapters 5 and 7 Operant Learning

Chapters 5 and 7 Operant Learning

Operant (Instrumental) Learning • Stimulus • Response • Outcome

Operant (Instrumental) Learning • Stimulus • Response • Outcome

Classical vs. Operant • Classical – Reflex action – Neutral stimulus associated with US

Classical vs. Operant • Classical – Reflex action – Neutral stimulus associated with US – Outside of subject’s control • Operant – Strengthens/weakens “voluntary” action – Subject does/doesn’t respond • Can occur together

 • Animal intelligence • Comparative psychology http: //www. psicoterapiaintegrativa. com/therapists/htms/Edward_Thorndike. htm Edward Thorndike

• Animal intelligence • Comparative psychology http: //www. psicoterapiaintegrativa. com/therapists/htms/Edward_Thorndike. htm Edward Thorndike

Experiments • Chicks, cats, dogs • Single animals • Observational learning

Experiments • Chicks, cats, dogs • Single animals • Observational learning

Puzzle Box

Puzzle Box

Trial-and-Error Thorndike 1898, p. 19

Trial-and-Error Thorndike 1898, p. 19

Law of Effect • "When particular stimulus-response sequences are followed by pleasure, those responses

Law of Effect • "When particular stimulus-response sequences are followed by pleasure, those responses tend to be ‘stamped in’; responses followed by pain tend to be ‘stamped out’. ” (Thorndike 1911) • Reinforced • Punished

Methodology • • Subjects Apparatus Escape latency Time-curves

Methodology • • Subjects Apparatus Escape latency Time-curves

All images Thorndike 1898, p. 18

All images Thorndike 1898, p. 18

Theory • Incremental learning • S-R • Direct experience

Theory • Incremental learning • S-R • Direct experience

Revision • Scientific method • Observational learning in non-humans

Revision • Scientific method • Observational learning in non-humans

www 1. appstate. edu/~kms/classes/psy 3202/images/puzzleboxes. gif

www 1. appstate. edu/~kms/classes/psy 3202/images/puzzleboxes. gif

B. F. Skinner • Operant response – The unit of behaviour – Effect it

B. F. Skinner • Operant response – The unit of behaviour – Effect it has on environment • Skinner’s approach ( video) • Operant chamber (video)

Discrete Trial & Free Operant • Discrete – – One trial at a time

Discrete Trial & Free Operant • Discrete – – One trial at a time Re-set apparatus Measure a behaviour Latency, running speed, reduction in errors – E. g. , maze • Free – Automatic repeat – Less disruptive for subject – Response rate – E. g. , operant chamber

Three-Term Contingency • • Contingency: Y iff X 1. Discriminative stimulus (SD) 2. Operant

Three-Term Contingency • • Contingency: Y iff X 1. Discriminative stimulus (SD) 2. Operant response (R) 3. Outcome (O) – Appetitive or aversive

Outcomes and Effects • Positive – Something is delivered • Negative – Something is

Outcomes and Effects • Positive – Something is delivered • Negative – Something is removed • Reinforcer – Causes behaviour to increase • Punisher – Causes behaviour to decrease • Effect on behaviour re: “reinforcer” or “punisher”

Four Basic Operant Relations Removed Presented Response Causes Stimulus to Be: Response Rate: Increases

Four Basic Operant Relations Removed Presented Response Causes Stimulus to Be: Response Rate: Increases Positive Reinforcement e. g. lever press --> get food Negative Reinforcement Decreases Positive Punishment e. g. lever press --> get shock Negative Punishment e. g. lever press --> stop shock e. g. lever press --> food lost

Types of Reinforcers • Primary – Not dependent on an association with other reinforcers

Types of Reinforcers • Primary – Not dependent on an association with other reinforcers • Secondary (“Conditioned Reinforcer”) – Neutral stimulus paired with primary reinforcer

Secondary Reinforcers • “Bridging”, “clicker” • Secondary extinction without periodic pairings with primary •

Secondary Reinforcers • “Bridging”, “clicker” • Secondary extinction without periodic pairings with primary • Generally weaker than primary • Less prone to satiation • Generalized reinforcer – Paired with many other kinds of reinforcers

Neurobiology of Reinforcement • Pleasure centres of brain (reward pathway) – Electrical stimulation of

Neurobiology of Reinforcement • Pleasure centres of brain (reward pathway) – Electrical stimulation of brain (ESB) • Dopamine – Major neurotransmitter – Released by appetitive stimuli

Dopamine Release • Different amounts of dopamine released • Unexpected reinforcement --> more dopamine

Dopamine Release • Different amounts of dopamine released • Unexpected reinforcement --> more dopamine release – Decreasing learning curve – Rescorla-Wagner – Less “surprising” the more you’ve learned; less dopamine released; less reinforcing

Addictive • Internal/external drugs – Orgasm, cocaine, crack • Dopamine very addictive • Dopamine

Addictive • Internal/external drugs – Orgasm, cocaine, crack • Dopamine very addictive • Dopamine converts to epinephrine (adrenaline) – “Thrill junkies” – Tolerance develops

Strength of Operant Learning • Condition practically any behaviour • Shaping (successive approximations)

Strength of Operant Learning • Condition practically any behaviour • Shaping (successive approximations)

Shaping a Lever Press • Gradual process • Reinforce more appropriate/precise responses • Feedback

Shaping a Lever Press • Gradual process • Reinforce more appropriate/precise responses • Feedback

Response Chains • • Sequences of behaviours in specific order Objective: primary reinforcer Conditioned

Response Chains • • Sequences of behaviours in specific order Objective: primary reinforcer Conditioned reinforcers Discriminative stimuli

Backwards Chaining • • Often used with “complex” training Start with last response in

Backwards Chaining • • Often used with “complex” training Start with last response in chain Next, second last response Third last, etc.

Chaining SD: discriminative stimulus R: response SR: secondary reinforcer PR: primary reinforcer SR 3

Chaining SD: discriminative stimulus R: response SR: secondary reinforcer PR: primary reinforcer SR 3 S D 1 R 3: climb up R 1: climb down S D 3 S D 2 SR 2 R 2: walk PR

Forward Chaining • Start with first response • Add additional links in chain

Forward Chaining • Start with first response • Add additional links in chain

Factors in Operant Learning

Factors in Operant Learning

Contiguity • Time between behaviour & outcome • Delays let other behaviours occur, forgetting,

Contiguity • Time between behaviour & outcome • Delays let other behaviours occur, forgetting, extinction (behaviour w/o reinforcement) – Learning with delay if stimulus “placeholder” provided (conditioned reinforcer? ) • Important re: punishment

Contingency • Correlation between behaviour & outcome • Strong vs. random contingency • Both

Contingency • Correlation between behaviour & outcome • Strong vs. random contingency • Both reinforcement and punishment

Outcome Characteristics • Larger reinforcers/punishers --> stronger learning – Not a linear effect •

Outcome Characteristics • Larger reinforcers/punishers --> stronger learning – Not a linear effect • Qualitative differences in reinforcers and punishers – Species & individual differences • Intensity of punisher – Tolerance

Task Characteristics • Some tasks easier to learn than others • Species & individual

Task Characteristics • Some tasks easier to learn than others • Species & individual differences • Innate and/or prior conditioning

Deprivation Levels • Generally, the greater the deprivation, the more effective the reinforcer •

Deprivation Levels • Generally, the greater the deprivation, the more effective the reinforcer • Reinforcer satiation • Deprivation can motivate punishable responses

Reinforcers in Punishment • What maintains undesired behaviour? • Benefit? • Alternative sources of

Reinforcers in Punishment • What maintains undesired behaviour? • Benefit? • Alternative sources of reinforcement – Find other ways to provide acceptable reinforcement

Latent Learning • Motivation • Learning behaviour • Performing behaviour

Latent Learning • Motivation • Learning behaviour • Performing behaviour

Average Errors Tolman & Honzig (1930) no food until day 11 food Day 11

Average Errors Tolman & Honzig (1930) no food until day 11 food Day 11 Days

Extinction • • • Response no longer produces same outcome Extinction burst Variability of

Extinction • • • Response no longer produces same outcome Extinction burst Variability of behaviour Aggression and frustration Spontaneous recovery

Behaviour Modification • • Also “behaviour analysis” Alter behaviour via operant conditioning Therapy Reinforcement

Behaviour Modification • • Also “behaviour analysis” Alter behaviour via operant conditioning Therapy Reinforcement vs. punishment

Problems with Punishment in Behaviour Modification • Application of the punisher • Incorrect use

Problems with Punishment in Behaviour Modification • Application of the punisher • Incorrect use of punishment – Creates issues or exacerbates punishment consequences • Tolerance – Start with strong punisher – Gradually reduce • General reluctance to administer

Possible Consequences of Punishment • Escape • Aggression, violence – At punisher, self, other

Possible Consequences of Punishment • Escape • Aggression, violence – At punisher, self, other • Apathy – General suppression of other behaviours • Abuse – Permanent damage • Imitation

Alternatives to Using Punishment

Alternatives to Using Punishment

Response Prevention • Make it impossible to do punishable behaviour • Circumvention • Younger

Response Prevention • Make it impossible to do punishable behaviour • Circumvention • Younger children

Extinction • • • Identify reinforcer of behaviour Withhold reinforcer Difficult to ID reinforcer

Extinction • • • Identify reinforcer of behaviour Withhold reinforcer Difficult to ID reinforcer Extinction bursts Slow

Differential Reinforcement • Differential reinforcement of low responses (DRL) – Only reinforce behaviour when

Differential Reinforcement • Differential reinforcement of low responses (DRL) – Only reinforce behaviour when response occurs at low frequency • Differential reinforcement of zero responses (DR 0) – Reinforcement contingent on not performing behaviour at all (in some time period)

 • Differential reinforcement of alternative behaviour (DRA) – Reinforcer gained from undesired behaviour

• Differential reinforcement of alternative behaviour (DRA) – Reinforcer gained from undesired behaviour now only available when some alternative behaviour done • Differential reinforcement of incompatible behaviour (DRI) – Reinforce behaviour completely incompatible with undesired response

Noncontingent Reinforcement • Provide desired reinforcer on regular basis regardless of what is being

Noncontingent Reinforcement • Provide desired reinforcer on regular basis regardless of what is being done • No correlation between response and outcome • May work because subject gets reinforcer for “free” • Problems if reinforcer comes after some other undesired behaviour (new acquisition)

Negative Punishment • Removal of pleasant stimulus • Time-out • Popular in human behaviour

Negative Punishment • Removal of pleasant stimulus • Time-out • Popular in human behaviour modification

Other Techniques for Behavioural Deceleration • Overcorrection – Repetitions of alternate, desired behaviour •

Other Techniques for Behavioural Deceleration • Overcorrection – Repetitions of alternate, desired behaviour • Restitution • Positive practice – Technically, punishment • Stimulus satiation

Escape and Avoidance

Escape and Avoidance

Definitions • Escape – Get away from aversive stimulus that is in progress •

Definitions • Escape – Get away from aversive stimulus that is in progress • Avoidance – Get away from aversive stimulus before it begins

Shuttle Box • Solomon & Wynne (1953) – Dogs – Chamber with barrier; Shock

Shuttle Box • Solomon & Wynne (1953) – Dogs – Chamber with barrier; Shock – Light off as signal

Theory Issues • For escape, no ambiguity – Aversive removed, behaviour increases = negative

Theory Issues • For escape, no ambiguity – Aversive removed, behaviour increases = negative reinforcement • What about avoidance? – Shuttles before shock – Behaviour increases – Nothing obvious removed or delivered • Mowrer & Lamoreaux (1942) – “…not getting something can hardly, in and of itself, qualify as rewarding. ”

Two-Process Theory • Classical and operant conditioning – Shock = US – Fear/pain/jump/twitch/ squeal

Two-Process Theory • Classical and operant conditioning – Shock = US – Fear/pain/jump/twitch/ squeal = UR – Darkness = CS – Fear of dark = CR • Fear: heart rate, breathing, stomach cramps, etc. • Negative reinforcement – Removal of fear (CR) • Escape from CS, not avoidance of shock • Two-process treats avoidance as just another type of escape behaviour

Support for Two-Process Theory • Rescorla & Lo. Lordo (1965) • Dog in shuttlebox

Support for Two-Process Theory • Rescorla & Lo. Lordo (1965) • Dog in shuttlebox – No signal – Response gives “safe time” • Pair tone with shock – Tone increases rate of response • CS can amplify avoidance • Conditioned inhibition can reduce avoidance

Problems with Two-Process Theory • Avoidance without observable fear – Heart rate – Not

Problems with Two-Process Theory • Avoidance without observable fear – Heart rate – Not consistent • Fear diminishes with avoidance learning

Measuring Fear • Kamin, Brimer, and Black (1963) – Lever press ---> food –

Measuring Fear • Kamin, Brimer, and Black (1963) – Lever press ---> food – Auditory CS ---> avoidance in shuttle box until: 1, 3, 9, 27 avoidances in a row – CS in operant chamber; check for suppression of lever press

Results Responding • Fear decreases during extended avoidance training • But, avoidance still strong

Results Responding • Fear decreases during extended avoidance training • But, avoidance still strong 1 3 9 Avoidance responses • Even low fear is enough? 27

Extinction in Avoidance Behaviour Odd prediction from two-process theory “Yo-yo” effect Avoidance should toggle

Extinction in Avoidance Behaviour Odd prediction from two-process theory “Yo-yo” effect Avoidance should toggle But! Avoidance is extremely persistent # of US received successful avoidance • • trials

One-Process Theory • Classical conditioning component unnecessary • Two interpretations of reinforcer – Molar

One-Process Theory • Classical conditioning component unnecessary • Two interpretations of reinforcer – Molar vs. molecular – Negative reinforcement: Overall reduction in exposure to punishers is reinforcer (text interpretation) – Postive reinforcement: Avoidance itself is reinforcer; subject gets reinforced by “safety” on a trial

Sidman Avoidance Task • Free-operant avoidance – Can avoidance be learned if no warning

Sidman Avoidance Task • Free-operant avoidance – Can avoidance be learned if no warning CS? • Shock at random intervals • Response gives safe time • Extensive training --> learn avoidance – But, usually never perfect – High variability across subjects • Two-process theory suggests: – Time becomes a CS (time elicits fear)

Herrnstein & Hineline (1966) • • • Rapid and slow shock rate schedules Response

Herrnstein & Hineline (1966) • • • Rapid and slow shock rate schedules Response switches schedules Shocks presented randomly, no signal Responses give shock reduction Reduction in shock frequency is reinforcer

Learned Helplessness • Behaviour has no effect on situation • Generalizes • Laboratory –

Learned Helplessness • Behaviour has no effect on situation • Generalizes • Laboratory – Give inescapable shocks – Shuttle box – Will not switch sides – Expectation that behaviour has no effect

Learned Helplessness in Humans • Depression • Situations beyond your control • Three dimensions

Learned Helplessness in Humans • Depression • Situations beyond your control • Three dimensions – Situation: specific or global – Attribute: internal or external – Time: short-term or long-term

Therapeutic Application • Confidence building (“can not fail”) – Implementation issues • Tasks that

Therapeutic Application • Confidence building (“can not fail”) – Implementation issues • Tasks that can be successfully completed – Produces immunization – Escapable condition … inescapable condition • Learned helplessness likely to develop

Theories of Operant Conditioning

Theories of Operant Conditioning

Hull’s Drive Reduction Theory • • Animals have motivational states (drives) Necessary for survival

Hull’s Drive Reduction Theory • • Animals have motivational states (drives) Necessary for survival Reinforcers are things that reduce drives Physiological value – Reduce physiological state

Drive Reduction Reinforcers • Works well with primary reinforcers • Many secondary reinforcers have

Drive Reduction Reinforcers • Works well with primary reinforcers • Many secondary reinforcers have no physiological value • Hull: association links secondary to drive • Some reinforcers hard to classify as primary or secondary • Some increase a physiological state • Some necessities undetectable • Roller coasters • Vitamins • Saccharin

Relative Value Theory & Premack Principle • Treat reinforcers as behaviours • Is it

Relative Value Theory & Premack Principle • Treat reinforcers as behaviours • Is it the food, or the behaviour of eating that is the reinforcer? • Behavioural probability scale • Greater or lesser value of behaviours relative to one another • No distinction between primary and secondary

Premack Principle • One behaviour will reinforce a second behaviour – High probability behaviour

Premack Principle • One behaviour will reinforce a second behaviour – High probability behaviour reinforces low probability behaviour • Baseline probability scale – Time – Rank order Probabilty of response = • Reinforcement relativity – No absolutes Time spent on response Total time

Example • Behaviours – Eat ice cream (I), play video game (V), read book

Example • Behaviours – Eat ice cream (I), play video game (V), read book (B) • Baseline (30 minutes) – Student 1: I (2 min), V (8 min), B (20 min) • Scale: I -- V -- B – Student 2: I (8 min), V (20 min), B (2 min) • Scale: B -- I -- V • Student 1: V reinforces I, B reinforces V & I • Student 2: I reinforces B, V reinforces I & B

Problems • Baseline phase – Fair rating? – How to compare very different behaviours

Problems • Baseline phase – Fair rating? – How to compare very different behaviours • Time problems – What if time not important to behaviour? – Behaviour duration? – Length of baseline period?

Response Deprivation Theory • Deprived behaviours = reinforcing behaviours • Drop below baseline level

Response Deprivation Theory • Deprived behaviours = reinforcing behaviours • Drop below baseline level of performance • Not relative frequency of one behaviour compared to another (i. e. , Premack) • Level of deprivation for a behaviour