140 656 MultiLevel Statistical Models If you did

  • Slides: 56
Download presentation
140. 656 Multi-Level Statistical Models If you did not receive the welcome email from

140. 656 Multi-Level Statistical Models If you did not receive the welcome email from me, email me at: (tlouis@jhsph. edu) Term 4, 2006 BIO 656 --Multilevel Models 1

ROOM CHANGE, AGAIN! • Starting Thursday, March 30 th and henceforth, lectures will be

ROOM CHANGE, AGAIN! • Starting Thursday, March 30 th and henceforth, lectures will be in W 2030 • Labs will still be in W 2009 Term 4, 2006 BIO 656 --Multilevel Models 2

Term 4, 2006 BIO 656 --Multilevel Models 3

Term 4, 2006 BIO 656 --Multilevel Models 3

Prerequisites, resources and Grading Term 4, 2006 BIO 656 --Multilevel Models 4

Prerequisites, resources and Grading Term 4, 2006 BIO 656 --Multilevel Models 4

Learning Objectives Term 4, 2006 BIO 656 --Multilevel Models 5

Learning Objectives Term 4, 2006 BIO 656 --Multilevel Models 5

Content & Approach Term 4, 2006 BIO 656 --Multilevel Models 6

Content & Approach Term 4, 2006 BIO 656 --Multilevel Models 6

Approach • Lectures include basic illustrations and case studies, structuring an approach and interpreting

Approach • Lectures include basic illustrations and case studies, structuring an approach and interpreting results – Labs address computing and amplify on the foregoing • My approach is formal, but not “mathematical” • To understand MLMs, you need a very good understanding on single-level models – If you understand these, you are ready to multi -level! Term 4, 2006 BIO 656 --Multilevel Models 7

Structure Term 4, 2006 BIO 656 --Multilevel Models 8

Structure Term 4, 2006 BIO 656 --Multilevel Models 8

RULES FOR HOMEWORK, MID-TERM AND PROJECT Homework • Must be individually prepared, but you

RULES FOR HOMEWORK, MID-TERM AND PROJECT Homework • Must be individually prepared, but you can get help • Homework due dates should be honored. • Turn in hard copy for grading The in-class, midterm • Must be prepared absolutely independently • During the exam, no advice or information can be obtained from others • You can use your notes and reference materials The term project • Must be individually prepared, but you can get help • Must be electronically submitted Term 4, 2006 BIO 656 --Multilevel Models 9

Handouts and the Web • Virtually all course materials will be on the web

Handouts and the Web • Virtually all course materials will be on the web • Check frequently for updates • I’ve provided hard copy of the general information sheet • However, other lectures will be on the web in powerpoint format and won’t be handed out • Download to your computer so you have an electronic version each part • Print if you need hard copy, but do it 4 or 6 to a page to save paper • More generally, try to “go electronic” printing sparingly Term 4, 2006 BIO 656 --Multilevel Models 10

COMPUTING & DATA • We will support Win. BUGS, Stata • We provide partial

COMPUTING & DATA • We will support Win. BUGS, Stata • We provide partial support for SAS, which should be used only by current SAS users; we aren’t teaching it from scratch • Some homeworks require use of Win. BUGS and another “traditional” program (STATA, SAS, R, . . . ) • We provide datasets, including some in the Win. BUGS examples Term 4, 2006 BIO 656 --Multilevel Models 11

WHY BUGS? • Freeware! • In MLMs, it’s important to see distributions – e.

WHY BUGS? • Freeware! • In MLMs, it’s important to see distributions – e. g. , Skewness of sampling distribution of variance component estimates • It’s important to incorporate all uncertainties in estimating random effects • Note that Win. Bugs isn’t very data input friendly • And, it’s difficult to produce P-values Term 4, 2006 BIO 656 --Multilevel Models 12

STATISTICAL MODELS • A statistical model is an approximation • Almost never is there

STATISTICAL MODELS • A statistical model is an approximation • Almost never is there a “correct” or “best” model, no holy grail • A model is a tool for structuring a statistical approach and addressing a scientific question • An effective model combines the data with prior information to address a question Term 4, 2006 BIO 656 --Multilevel Models 13

MULTI-LEVEL MODELS • Biological, physical, psycho/social processes that influence health occur at many levels:

MULTI-LEVEL MODELS • Biological, physical, psycho/social processes that influence health occur at many levels: – Cell Organ Person Family Nhbd City Society . . . Solar system – Crew Vessel Fleet . . . – Block Group Tract . . . – Visit Patient Phy Clinic HMO . . . • Covariates can be at each level • Many “units of analysis” • More modern and flexible parlance and approach: “many variance components” Term 4, 2006 BIO 656 --Multilevel Models 14

Example: Alcohol Abuse • Cell: neurochemistry • Organ: ability to metabolize ethanol • Person:

Example: Alcohol Abuse • Cell: neurochemistry • Organ: ability to metabolize ethanol • Person: genetic susceptibility to addiction • Family: alcohol abuse in the home • Neighborhood: availability of bars • Society: regulations; organizations; social norms Term 4, 2006 BIO 656 --Multilevel Models 15

ALCOHOL ABUSE: A multi-level, interaction model • Interaction between existence of bars & state,

ALCOHOL ABUSE: A multi-level, interaction model • Interaction between existence of bars & state, drunk driving laws • Alcohol abuse in a family & ability to metabolize ethanol • Genetic predisposition to addiction & household environment • State regulations about intoxication & job requirements Term 4, 2006 BIO 656 --Multilevel Models 16

Many names for similar, but not identical models, analyses and goals • Multi-Level Models

Many names for similar, but not identical models, analyses and goals • Multi-Level Models • Random effects models • Mixed models • Random coefficient models • Hierarchical models • Bayesian Models Term 4, 2006 BIO 656 --Multilevel Models 17

We don’t need MLMs • If your question is about slopes on regressors, you

We don’t need MLMs • If your question is about slopes on regressors, you can run a standard regression and (usually) get valid slope estimates Y = 0 + 1(areal monitor) + 2(home monitor) +. . . Y = 0 + 1(zipcode income) + 2(personal income) +. . . logit(P) =. . . • Analysis can be followed by computing a “robust” SE to get valid inferences Term 4, 2006 BIO 656 --Multilevel Models 18

We do need MLMs • If your question is about variance components, you need

We do need MLMs • If your question is about variance components, you need to build the multi-level model Yijkl = 0 + 1 X 1 + 2 X 2 +. . . + ijkl Var(Yijkl) = Var( ijkl) = = VHospital + VClinic + VPhysician + VPatient + Vunexplained • These variances depend on what Xs are in the model Term 4, 2006 BIO 656 --Multilevel Models 19

We do need MLMs • To create a broad class of correlation structures –

We do need MLMs • To create a broad class of correlation structures – Longitudinal correlations – Nested correlations • To structure improving unit-level estimates (latent effects) and to make unit-level predictions Term 4, 2006 BIO 656 --Multilevel Models 20

MLMs are effective in producing “working models” that incorporate stochastic realities • Producing efficient

MLMs are effective in producing “working models” that incorporate stochastic realities • Producing efficient population estimates • Broadening the inference beyond “these units” • Protecting against some types of informative missing data processes • Producing correlation structures • Generating “overdispersed” versions of standard models • Structuring estimation of latent effects But, MLMs can be fragile and care is needed Term 4, 2006 BIO 656 --Multilevel Models 21

MLMs are not and should not be • A religion • A truth •

MLMs are not and should not be • A religion • A truth • The only way to model multi-level data! Term 4, 2006 BIO 656 --Multilevel Models 22

Improving individual-level estimates Similar to the BUGS rat data • Dependent variable (Yij) is

Improving individual-level estimates Similar to the BUGS rat data • Dependent variable (Yij) is weight for rat “i” at age Xij i = 1, . . . , I (=10); j = 1, . . . , J (=5) Xij = Xj = (-14, -7, 0, 7, 14) = (8 -22, 15 -22, 22 -22, 29 -22 36 -22) Yij = bi 0 + bi 1 Xj + ij – As usual, the intercept depends on the centering • Analyses – Each rat has its own line – All rats follow the same line: bi 0 = 0 , bi 1 = 1 – A compromise between these two Term 4, 2006 BIO 656 --Multilevel Models 23

Each rat has its own (LSE, MLE) line (with the population line) Pop line

Each rat has its own (LSE, MLE) line (with the population line) Pop line Term 4, 2006 BIO 656 --Multilevel Models 24

A multi-level model: Each rat has its own line, but the lines come from

A multi-level model: Each rat has its own line, but the lines come from the same distribution • The bi 0 are independent Normal( 0, 02) • The bi 1 are independent N( 1, 12) Overdispersion • Sample variance of the OLS estimated intercepts: 345 = SEint 2 + 02 = 320 + 02 = 25, 0 = 5 • Sample variance of the OLS estimated slopes 4. 25 = SEslope 2 + 12 = 3. 25 + 12 = 1. 00, 1 = 1. 00 Term 4, 2006 BIO 656 --Multilevel Models 25

A compromise: each rat has its own line, but the lines come from the

A compromise: each rat has its own line, but the lines come from the same distribution Pop line Term 4, 2006 BIO 656 --Multilevel Models 26

ONE-WAY RANDOM EFFECTS ANOVA Term 4, 2006 BIO 656 --Multilevel Models 27

ONE-WAY RANDOM EFFECTS ANOVA Term 4, 2006 BIO 656 --Multilevel Models 27

Simulated “Neighborhood Clustering” • Random mean for each of 10 neighborhoods (J=10) b 1,

Simulated “Neighborhood Clustering” • Random mean for each of 10 neighborhoods (J=10) b 1, b 2, . . . , b 10 (iid) N(10, 9) • Random deviation from neighborhood mean for each of 10 persons in each neighborhood (n=10) Yij = bj + eij, eij (iid) N(0, 4) Conditional Independence Over-dispersion: Variance of each point is 13 (= 4 + 9) Correlation: Measurements within each cluster are correlated Term 4, 2006 BIO 656 --Multilevel Models 28

Term 4, 2006 BIO 656 --Multilevel Models 29

Term 4, 2006 BIO 656 --Multilevel Models 29

Intra-class Correlation (ICC) • Correlation of two observations in the same cluster: ICC =

Intra-class Correlation (ICC) • Correlation of two observations in the same cluster: ICC = Var(Between)/ Var(Total) = 1 – Var(Within)/Var(Total) Estimated ICC: 0. 67 = (9. 8 -3. 2)/9. 8 True ICC: 0. 69 = 9/(9 + 4) = 9/13 Term 4, 2006 BIO 656 --Multilevel Models 30

V(b) Term 4, 2006 BIO 656 --Multilevel Models 31

V(b) Term 4, 2006 BIO 656 --Multilevel Models 31

Term 4, 2006 BIO 656 --Multilevel Models 32

Term 4, 2006 BIO 656 --Multilevel Models 32

Term 4, 2006 BIO 656 --Multilevel Models 33

Term 4, 2006 BIO 656 --Multilevel Models 33

Term 4, 2006 BIO 656 --Multilevel Models 34

Term 4, 2006 BIO 656 --Multilevel Models 34

Term 4, 2006 BIO 656 --Multilevel Models 35

Term 4, 2006 BIO 656 --Multilevel Models 35

Term 4, 2006 BIO 656 --Multilevel Models 36

Term 4, 2006 BIO 656 --Multilevel Models 36

45 o line regression line Pop line Term 4, 2006 BIO 656 --Multilevel Models

45 o line regression line Pop line Term 4, 2006 BIO 656 --Multilevel Models 37

Term 4, 2006 BIO 656 --Multilevel Models 38

Term 4, 2006 BIO 656 --Multilevel Models 38

Term 4, 2006 BIO 656 --Multilevel Models 39

Term 4, 2006 BIO 656 --Multilevel Models 39

Term 4, 2006 BIO 656 --Multilevel Models 40

Term 4, 2006 BIO 656 --Multilevel Models 40

Term 4, 2006 BIO 656 --Multilevel Models 41

Term 4, 2006 BIO 656 --Multilevel Models 41

Term 4, 2006 BIO 656 --Multilevel Models 42

Term 4, 2006 BIO 656 --Multilevel Models 42

Term 4, 2006 BIO 656 --Multilevel Models 43

Term 4, 2006 BIO 656 --Multilevel Models 43

WEIGHTED MEANS Term 4, 2006 BIO 656 --Multilevel Models 44

WEIGHTED MEANS Term 4, 2006 BIO 656 --Multilevel Models 44

Term 4, 2006 BIO 656 --Multilevel Models 45

Term 4, 2006 BIO 656 --Multilevel Models 45

Term 4, 2006 BIO 656 --Multilevel Models 46

Term 4, 2006 BIO 656 --Multilevel Models 46

Term 4, 2006 BIO 656 --Multilevel Models 47

Term 4, 2006 BIO 656 --Multilevel Models 47

Term 4, 2006 BIO 656 --Multilevel Models 48

Term 4, 2006 BIO 656 --Multilevel Models 48

Term 4, 2006 BIO 656 --Multilevel Models 49

Term 4, 2006 BIO 656 --Multilevel Models 49

Term 4, 2006 BIO 656 --Multilevel Models 50

Term 4, 2006 BIO 656 --Multilevel Models 50

Term 4, 2006 BIO 656 --Multilevel Models 51

Term 4, 2006 BIO 656 --Multilevel Models 51

INFERENCE SPACE (Sanders) • The choice between fixed and random effects depends in part

INFERENCE SPACE (Sanders) • The choice between fixed and random effects depends in part on the reference population (the inference space) –These studies or people – Studies or people like these –. . Term 4, 2006 BIO 656 --Multilevel Models 52

Random Effects should replace “unit of analysis” • Models contain Fixed-effects, Random effects (via

Random Effects should replace “unit of analysis” • Models contain Fixed-effects, Random effects (via Variance Components) and other correlation-inducers • There are many “units” and so in effect no single set of units • Random Effects induce unexplained (co)variance • Some of the unexplained may be explicable by including additional covariates • MLMs are one way to induce a structure and estimate the REs Term 4, 2006 BIO 656 --Multilevel Models 53

PLEASE DO THIS If you did not receive the welcome email from me, email

PLEASE DO THIS If you did not receive the welcome email from me, email me at: (tlouis@jhsph. edu) Term 4, 2006 BIO 656 --Multilevel Models 54

ROOM CHANGE, AGAIN! • Starting Thursday, March 30 th and henceforth, lectures will be

ROOM CHANGE, AGAIN! • Starting Thursday, March 30 th and henceforth, lectures will be in W 2030 • Labs will still be in W 2009 Term 4, 2006 BIO 656 --Multilevel Models 55

END OF PART I Term 4, 2006 BIO 656 --Multilevel Models 56

END OF PART I Term 4, 2006 BIO 656 --Multilevel Models 56