Chapters 5 and 7 Operant Learning Operant Instrumental
- Slides: 74
Chapters 5 and 7 Operant Learning
Operant (Instrumental) Learning • Stimulus • Response • Outcome
Classical vs. Operant • Classical – Reflex action – Neutral stimulus associated with US – Outside of subject’s control • Operant – Strengthens/weakens “voluntary” action – Subject does/doesn’t respond • Can occur together
• Animal intelligence • Comparative psychology http: //www. psicoterapiaintegrativa. com/therapists/htms/Edward_Thorndike. htm Edward Thorndike
Experiments • Chicks, cats, dogs • Single animals • Observational learning
Puzzle Box
Trial-and-Error Thorndike 1898, p. 19
Law of Effect • "When particular stimulus-response sequences are followed by pleasure, those responses tend to be ‘stamped in’; responses followed by pain tend to be ‘stamped out’. ” (Thorndike 1911) • Reinforced • Punished
Methodology • • Subjects Apparatus Escape latency Time-curves
All images Thorndike 1898, p. 18
Theory • Incremental learning • S-R • Direct experience
Revision • Scientific method • Observational learning in non-humans
www 1. appstate. edu/~kms/classes/psy 3202/images/puzzleboxes. gif
B. F. Skinner • Operant response – The unit of behaviour – Effect it has on environment • Skinner’s approach ( video) • Operant chamber (video)
Discrete Trial & Free Operant • Discrete – – One trial at a time Re-set apparatus Measure a behaviour Latency, running speed, reduction in errors – E. g. , maze • Free – Automatic repeat – Less disruptive for subject – Response rate – E. g. , operant chamber
Three-Term Contingency • • Contingency: Y iff X 1. Discriminative stimulus (SD) 2. Operant response (R) 3. Outcome (O) – Appetitive or aversive
Outcomes and Effects • Positive – Something is delivered • Negative – Something is removed • Reinforcer – Causes behaviour to increase • Punisher – Causes behaviour to decrease • Effect on behaviour re: “reinforcer” or “punisher”
Four Basic Operant Relations Removed Presented Response Causes Stimulus to Be: Response Rate: Increases Positive Reinforcement e. g. lever press --> get food Negative Reinforcement Decreases Positive Punishment e. g. lever press --> get shock Negative Punishment e. g. lever press --> stop shock e. g. lever press --> food lost
Types of Reinforcers • Primary – Not dependent on an association with other reinforcers • Secondary (“Conditioned Reinforcer”) – Neutral stimulus paired with primary reinforcer
Secondary Reinforcers • “Bridging”, “clicker” • Secondary extinction without periodic pairings with primary • Generally weaker than primary • Less prone to satiation • Generalized reinforcer – Paired with many other kinds of reinforcers
Neurobiology of Reinforcement • Pleasure centres of brain (reward pathway) – Electrical stimulation of brain (ESB) • Dopamine – Major neurotransmitter – Released by appetitive stimuli
Dopamine Release • Different amounts of dopamine released • Unexpected reinforcement --> more dopamine release – Decreasing learning curve – Rescorla-Wagner – Less “surprising” the more you’ve learned; less dopamine released; less reinforcing
Addictive • Internal/external drugs – Orgasm, cocaine, crack • Dopamine very addictive • Dopamine converts to epinephrine (adrenaline) – “Thrill junkies” – Tolerance develops
Strength of Operant Learning • Condition practically any behaviour • Shaping (successive approximations)
Shaping a Lever Press • Gradual process • Reinforce more appropriate/precise responses • Feedback
Response Chains • • Sequences of behaviours in specific order Objective: primary reinforcer Conditioned reinforcers Discriminative stimuli
Backwards Chaining • • Often used with “complex” training Start with last response in chain Next, second last response Third last, etc.
Chaining SD: discriminative stimulus R: response SR: secondary reinforcer PR: primary reinforcer SR 3 S D 1 R 3: climb up R 1: climb down S D 3 S D 2 SR 2 R 2: walk PR
Forward Chaining • Start with first response • Add additional links in chain
Factors in Operant Learning
Contiguity • Time between behaviour & outcome • Delays let other behaviours occur, forgetting, extinction (behaviour w/o reinforcement) – Learning with delay if stimulus “placeholder” provided (conditioned reinforcer? ) • Important re: punishment
Contingency • Correlation between behaviour & outcome • Strong vs. random contingency • Both reinforcement and punishment
Outcome Characteristics • Larger reinforcers/punishers --> stronger learning – Not a linear effect • Qualitative differences in reinforcers and punishers – Species & individual differences • Intensity of punisher – Tolerance
Task Characteristics • Some tasks easier to learn than others • Species & individual differences • Innate and/or prior conditioning
Deprivation Levels • Generally, the greater the deprivation, the more effective the reinforcer • Reinforcer satiation • Deprivation can motivate punishable responses
Reinforcers in Punishment • What maintains undesired behaviour? • Benefit? • Alternative sources of reinforcement – Find other ways to provide acceptable reinforcement
Latent Learning • Motivation • Learning behaviour • Performing behaviour
Average Errors Tolman & Honzig (1930) no food until day 11 food Day 11 Days
Extinction • • • Response no longer produces same outcome Extinction burst Variability of behaviour Aggression and frustration Spontaneous recovery
Behaviour Modification • • Also “behaviour analysis” Alter behaviour via operant conditioning Therapy Reinforcement vs. punishment
Problems with Punishment in Behaviour Modification • Application of the punisher • Incorrect use of punishment – Creates issues or exacerbates punishment consequences • Tolerance – Start with strong punisher – Gradually reduce • General reluctance to administer
Possible Consequences of Punishment • Escape • Aggression, violence – At punisher, self, other • Apathy – General suppression of other behaviours • Abuse – Permanent damage • Imitation
Alternatives to Using Punishment
Response Prevention • Make it impossible to do punishable behaviour • Circumvention • Younger children
Extinction • • • Identify reinforcer of behaviour Withhold reinforcer Difficult to ID reinforcer Extinction bursts Slow
Differential Reinforcement • Differential reinforcement of low responses (DRL) – Only reinforce behaviour when response occurs at low frequency • Differential reinforcement of zero responses (DR 0) – Reinforcement contingent on not performing behaviour at all (in some time period)
• Differential reinforcement of alternative behaviour (DRA) – Reinforcer gained from undesired behaviour now only available when some alternative behaviour done • Differential reinforcement of incompatible behaviour (DRI) – Reinforce behaviour completely incompatible with undesired response
Noncontingent Reinforcement • Provide desired reinforcer on regular basis regardless of what is being done • No correlation between response and outcome • May work because subject gets reinforcer for “free” • Problems if reinforcer comes after some other undesired behaviour (new acquisition)
Negative Punishment • Removal of pleasant stimulus • Time-out • Popular in human behaviour modification
Other Techniques for Behavioural Deceleration • Overcorrection – Repetitions of alternate, desired behaviour • Restitution • Positive practice – Technically, punishment • Stimulus satiation
Escape and Avoidance
Definitions • Escape – Get away from aversive stimulus that is in progress • Avoidance – Get away from aversive stimulus before it begins
Shuttle Box • Solomon & Wynne (1953) – Dogs – Chamber with barrier; Shock – Light off as signal
Theory Issues • For escape, no ambiguity – Aversive removed, behaviour increases = negative reinforcement • What about avoidance? – Shuttles before shock – Behaviour increases – Nothing obvious removed or delivered • Mowrer & Lamoreaux (1942) – “…not getting something can hardly, in and of itself, qualify as rewarding. ”
Two-Process Theory • Classical and operant conditioning – Shock = US – Fear/pain/jump/twitch/ squeal = UR – Darkness = CS – Fear of dark = CR • Fear: heart rate, breathing, stomach cramps, etc. • Negative reinforcement – Removal of fear (CR) • Escape from CS, not avoidance of shock • Two-process treats avoidance as just another type of escape behaviour
Support for Two-Process Theory • Rescorla & Lo. Lordo (1965) • Dog in shuttlebox – No signal – Response gives “safe time” • Pair tone with shock – Tone increases rate of response • CS can amplify avoidance • Conditioned inhibition can reduce avoidance
Problems with Two-Process Theory • Avoidance without observable fear – Heart rate – Not consistent • Fear diminishes with avoidance learning
Measuring Fear • Kamin, Brimer, and Black (1963) – Lever press ---> food – Auditory CS ---> avoidance in shuttle box until: 1, 3, 9, 27 avoidances in a row – CS in operant chamber; check for suppression of lever press
Results Responding • Fear decreases during extended avoidance training • But, avoidance still strong 1 3 9 Avoidance responses • Even low fear is enough? 27
Extinction in Avoidance Behaviour Odd prediction from two-process theory “Yo-yo” effect Avoidance should toggle But! Avoidance is extremely persistent # of US received successful avoidance • • trials
One-Process Theory • Classical conditioning component unnecessary • Two interpretations of reinforcer – Molar vs. molecular – Negative reinforcement: Overall reduction in exposure to punishers is reinforcer (text interpretation) – Postive reinforcement: Avoidance itself is reinforcer; subject gets reinforced by “safety” on a trial
Sidman Avoidance Task • Free-operant avoidance – Can avoidance be learned if no warning CS? • Shock at random intervals • Response gives safe time • Extensive training --> learn avoidance – But, usually never perfect – High variability across subjects • Two-process theory suggests: – Time becomes a CS (time elicits fear)
Herrnstein & Hineline (1966) • • • Rapid and slow shock rate schedules Response switches schedules Shocks presented randomly, no signal Responses give shock reduction Reduction in shock frequency is reinforcer
Learned Helplessness • Behaviour has no effect on situation • Generalizes • Laboratory – Give inescapable shocks – Shuttle box – Will not switch sides – Expectation that behaviour has no effect
Learned Helplessness in Humans • Depression • Situations beyond your control • Three dimensions – Situation: specific or global – Attribute: internal or external – Time: short-term or long-term
Therapeutic Application • Confidence building (“can not fail”) – Implementation issues • Tasks that can be successfully completed – Produces immunization – Escapable condition … inescapable condition • Learned helplessness likely to develop
Theories of Operant Conditioning
Hull’s Drive Reduction Theory • • Animals have motivational states (drives) Necessary for survival Reinforcers are things that reduce drives Physiological value – Reduce physiological state
Drive Reduction Reinforcers • Works well with primary reinforcers • Many secondary reinforcers have no physiological value • Hull: association links secondary to drive • Some reinforcers hard to classify as primary or secondary • Some increase a physiological state • Some necessities undetectable • Roller coasters • Vitamins • Saccharin
Relative Value Theory & Premack Principle • Treat reinforcers as behaviours • Is it the food, or the behaviour of eating that is the reinforcer? • Behavioural probability scale • Greater or lesser value of behaviours relative to one another • No distinction between primary and secondary
Premack Principle • One behaviour will reinforce a second behaviour – High probability behaviour reinforces low probability behaviour • Baseline probability scale – Time – Rank order Probabilty of response = • Reinforcement relativity – No absolutes Time spent on response Total time
Example • Behaviours – Eat ice cream (I), play video game (V), read book (B) • Baseline (30 minutes) – Student 1: I (2 min), V (8 min), B (20 min) • Scale: I -- V -- B – Student 2: I (8 min), V (20 min), B (2 min) • Scale: B -- I -- V • Student 1: V reinforces I, B reinforces V & I • Student 2: I reinforces B, V reinforces I & B
Problems • Baseline phase – Fair rating? – How to compare very different behaviours • Time problems – What if time not important to behaviour? – Behaviour duration? – Length of baseline period?
Response Deprivation Theory • Deprived behaviours = reinforcing behaviours • Drop below baseline level of performance • Not relative frequency of one behaviour compared to another (i. e. , Premack) • Level of deprivation for a behaviour
- Fanning oneself to escape from the heat
- Fixed ratio graph
- Example of classical conditioning
- Classical conditioning and operant conditioning
- Classical conditioning
- Instrumental learning vs classical conditioning
- Difference between classical and operant conditioning
- Cuadro comparativo e-learning y b-learning
- Operant conditioning in early childhood education
- Social learning theory assumptions
- Albert bandura operant conditioning
- Wazn country
- Fixed interval schedule example
- Instumental values
- An individual's enduring tendency to feel
- Influential power
- Black saturday
- Country of oneat
- To kill a mockingbird chapter 10 and 11 summary
- To kill a mockingbird chapters 1-4
- Noughts and crosses zusammenfassung
- The american dream in the great gatsby chapter 3
- Based on chapters 3 and 4 of the strange case
- Charlie and the chocolate factory chapters
- Summary of chapter 11 to kill a mockingbird
- Scarlet letter chapter 11
- Chapter 52 pride and prejudice
- Lord of flies chapter 9
- Chapter 22 summary scarlet letter
- Inductive and analytical learning
- Focl in machine learning
- Eager classification versus lazy classification
- Deep learning approach and surface learning approach
- Difference between classical and operant conditioning
- Little albert experiment summary
- Negative reinforcement
- Operant conditioning
- Classical conditioning ap psychology
- Operant conditioning and personality
- Operant conditioning vs classical conditioning
- Respondent and operant conditioning
- Manejo de instrumental contaminado
- Kosatib pokus
- Sadaf sajjad
- Social contract orientation example
- It is basically meditative in character
- Racionalidad instrumental
- Stage 3: good boy, nice girl orientation
- Conclusion de operatoria dental
- Instrumental variables
- Definition of instrumental delivery
- Instrumental s
- Instrumental nastavci
- Funciones del lenguaje halliday
- Instumental values
- Reforç negatiu
- Cursuri instrumental
- Strengths of conflict theory
- Instrumental values
- Revised frustration aggression theory
- Instrumental conceptualism
- Baroque vocal music
- Agresión instrumental
- Heighten aggressive or hostile behavior
- Instrumental aggression definition
- Instrumental values examples
- Condicionamento instrumental
- Tequila meme song
- 6 generic functions of public administration
- Ukuran pembelajaran konsumen
- Instrumental value
- Material critico semicritico y no critico
- Forceps mnemonic for instrumental delivery
- Conditionnement classique exemple publicité
- The instrumental argument says stakeholder management is