Schedules of Reinforcement and Choice Simple Schedules Ratio





























- Slides: 29

Schedules of Reinforcement and Choice

Simple Schedules • • Ratio Interval Fixed Variable

Fixed Ratio • CRF = FR 1 • Partial/intermittent reinforcement • Post reinforcement pause

Causes of FR PRP • Fatigue hypothesis • Satiation hypothesis • Remaining-responses hypothesis – Reinforcer is a discriminative stimulus signaling absence of next reinforcer any time soon

Evidence • PRP increases as FR size increases – Does not support satiation • Multiple FR schedules – Long and short schedules – PRP longer if next schedule long, shorter if next one short • Does not support fatigue L L S FR 10 S FR 40 S L

Fixed Interval • • Also has PRP Not remaining responses, though Time estimation Minimize cost-to-benefit

Variable Ratio • Steady response pattern • PRPs unusual • High response rate

Variable Interval • Steady response pattern • Slower response rate than VR

Comparison of VR and VI Response Rates • Response rate for VR faster than for VI • Molecular theories – Small-scale events – Reinforcement on trial-by-trial basis • Molar theories – Large-scale events – Reinforcement over whole session

IRT Reinforcement Theory • Molecular theory • IRT: Interresponse time • Time between two consecutive responses • VI schedule – Long IRT reinforced • VR schedule – Time irrelevant – Short IRT reinforced

• Random number generator (mean=5) • 30 reinforcer deliveries Time/number for reinforcement 895323566436145668… Time b/t responses 1 3 10 4 1 1 1 3 8 8 7 9 9 7 1 1 5 6 1 9 8 5 1 4 1 9 6 3 10 5 … i i i i r r r Ratio number Interval 1 2 3 4 5 6 seconds 7 8 9 10

Response-Reinforcer Correlation Theory – Long-range reinforcement outcome – Trial-by-trial unimportant 100 Reinforcers/hour • Molar theory • Response-reinforcer relationship across whole experimental session VR 60 50 VI 60 sec • Criticism: too cognitive 50 Responses/minute 100

Choice • • • 2 key/lever protocol Ratio-ratio Interval-interval Typically VI-VI CODs

Matching Law B 1 + B 2 = R 1 + R 2 or • B = behaviour (responses) • R = reinforcement B 1 B 2 = R 1 R 2

Bias • Spend more time on one alternative than predicted • Side preferences • Biological predispositions • Quality and amount • Undermatching, overmatching

Qualities and Amounts • • Q 1: quality of first reinforcer Q 2: quality of second reinforcer A 1: amount of first reinforcer A 2: amount of second reinforcer

Undermatching • Most common • Response proportions less extreme than reinforcement proportions

Overmatching • Response proportions are more extreme than reinforcement proportions • Rare • Found when large penalty imposed for switching – e. g. , barrier between keys

Undermatching/Overmatching Undermatching 1 B 1+B 2 1 0. 5 0 0. 5 R 1 R 1+R 2 1

Baum’s Variation B 1 B 2 = b( R 1 s R 2 ) • s = sensitivity of behaviour relative to rate of reinforcement – Perfect matching, s=1 – Undermatching, s<1 – Overmatching, s>1 • b = response bias

Matching as a Theory of Choice • Animals match because they are evolved to do so. • Nice, simple approach, but ultimately wrong. • Consider a VR-VR schedule – Exclusively choose one alternative • Whichever is lower – Matching law can’t explain this

Melioration Theory • Invest effort in “best” alternative • In VI-VI, partition responses to get best reinforcer: response ratio – Overshooting the goal; feedback loop • In VR-VR, keep shifting towards lower schedule; gives best reinforcer: response ratio • Mixture of responding important over long run, but trial-by-trial responding shifts the balance

Optimization Theory • Optimize reinforcement over long-term • Minimum work for maximum gain • Respond to both choices to maximize reinforcement

Momentary Maximization Theory • Molecular theory • Select alternative that has highest value at that moment • Short-term vs. long-term benefits

Delay-reduction Theory • Immediate or delayed reinforcement • Basic principles of matching law, and. . . • Choice directed towards whichever alternative gives greatest reduction in delay to next reinforcer • Molar (matching response: reinforcement) and molecular (control by shorter delay) features

Self-Control • Conflict between short- and long-term choices • Choice between small, immediate reward or larger, delayed reward • Self-control easier if immediate reinforcer delayed or harder to get

Value-Discounting Function • V = M/(1+KD) – V = value of reinforcer – M = reward magnitude – K = discounting rate parameter – D = reward delay • Set M = 10, K = 5 – If D = 0, then V = M/(1+0) = 10 – If D = 10, then V = M/(1+5*10) = 10/51 = 0. 196

Reward Size & Delay • Set M=5, K=5, D=1 – V = 5/(1+5*1) = 5/6 = 0. 833 • Set M=10, K=5, D=5 – V = 10/(1+5*5) = 10/26 = 0. 385 • To get same V with D=5 need to set M=21. 66

Ainslie-Rachlin Theory • Value of reinforcer decreases as delay b/t choice & getting reinforcer increases • Choose reinforcer with higher value at the moment of choice • Ability to change mind; binding decisions
Fixed ratio schedule of reinforcement
Acquiring experience
Classical vs operant conditioning
Crf schedule of reinforcement
A blueberry picker receives $1 after filling 3 pint boxes.
Instinctive drift psychology definition
Compound schedules of reinforcement examples
Skinner's schedules of reinforcement
Classical conditioning
Fixed ratio schedule example
Schedules of reinforcement
Basic schedules of reinforcement
Reinforcement schedule types
Variable interval psychology definition
Instrumental or operant conditioning
Primary vs secondary reinforcers
Mswo data sheet
Variable ratio schedule of reinforcement
Variable schedule of reinforcement
Good choice or bad choice
Acid test ratio and quick ratio
Incomplete vs codominance
Current ratio and quick ratio
Simple present, simple past, simple future
Creating production possibilities schedules and curves
6-1 tax tables worksheets and schedules
Directories pricing tables schedules and name list
Hazard ratio vs odds ratio
What is the reciprocal of the velocity ratio?
Theory of insightful learning