Intervention Study Kenya PRIMR Case Regression Analysis March

  • Slides: 22
Download presentation
Intervention Study: Kenya PRIMR Case Regression Analysis March 2017 Susan Edwards, RTI International Sarrynna

Intervention Study: Kenya PRIMR Case Regression Analysis March 2017 Susan Edwards, RTI International Sarrynna Sou, RTI International 1 RTI International is a registered trademark and a trade name of Research Triangle Institute. www. rti. org

Overview § Linear Regression – Replicating T-Test Analysis – Replicating Di. D Analysis –

Overview § Linear Regression – Replicating T-Test Analysis – Replicating Di. D Analysis – Controlling for Other Variables § § Logistic Regression – STATA Code: Similar to Linear Regression – Controlling for Other Variables § 2 Interpreting Estimates Interpreting Odds Ratios

Linear Regression Analysis § 3

Linear Regression Analysis § 3

Linear Regression Analysis - Example § 4

Linear Regression Analysis - Example § 4

Linear Regression Analysis – Reference Cell Coding § Recall: § Interpretations: – § orf

Linear Regression Analysis – Reference Cell Coding § Recall: § Interpretations: – § orf = 75 + 2. 7 I(female) – 3. 7 (age) This model suggests that girls read on average 2. 7 words per minute more than boys when controlling for student age. Reference Cell Coding: One level of a categorical variable is determined to be the reference. – All other estimates are presented in comparison to the reference. – § Example: – Female § 5 2 levels 0 = Male 1 = Female 2. 7 I(female) = the # of wpm difference between males and females

Linear Regression Analysis – Reference Cell Coding § Example with More than 2 Levels:

Linear Regression Analysis – Reference Cell Coding § Example with More than 2 Levels: – Age Category 3 levels 0 = Younger than Grade Level § 1 = At Grade Level § 2 = Older than Grade Level § – = Below 7 = 7 or 8 = Above 8 Model for ORF: ORF = 50 + 0. 2 I(At Grade Level) + -13 I(Older than Grade Level) – Questions: What is the average fluency for students in public schools? § What is the average fluency for students in private schools? § Do students in public schools preform better on average than students in religious schools? § Do students in private schools preform better on average than students in religious schools? § 6

Categorical vs. Continuous Independent Variables Why do we care? Categorical § Definition: – §

Categorical vs. Continuous Independent Variables Why do we care? Categorical § Definition: – § STATA cares. Continuous § Definition: A variable that can be divided into distinct categories. Examples: – § gender – age category – STATA code: – Start variables with “i. ” followed by variable name i. <variable name> Reference Cell Coding 7 Examples: orf – age – Reading comprehension score? – § A variable that theoretically could go on forever § § Generally ranges from 0 to 5. STATA code: – List variable name in equation line.

Linear Regression Analysis – STATA Example § 8

Linear Regression Analysis – STATA Example § 8

Linear Regression Analysis – STATA Activity § Recall: STATA code to fit a model

Linear Regression Analysis – STATA Activity § Recall: STATA code to fit a model for gender and age. svy: reg eq_orf i. female age § Fit a linear model for English fluency (eq_orf) that accounts for the following school factors (nonformal; enrolment) svy: reg eq_orf i. nonformal enrolment Why does nonformal have an “i. ” in front of the variable name? – What type of variable is enrolment in this model? – How would we change enrolment to be a categorical variable? – Would the model work if we typed the following? – § 9 svy: reg enrolment i. nonformal eq_orf

T-Test Results with Linear Regression in STATA § Recall: T-Tests compare the means of

T-Test Results with Linear Regression in STATA § Recall: T-Tests compare the means of two groups. § Example: ttest eq_orf, by (treat_phase) Is there a different between baseline and endline scores? Mean (N) § October 2013 48 wpm (913) 53 wpm (922) Difference (S. E. ) 4. 4 wpm (1. 7) T-Stat (DOF) 2. 59 (1833) H 0: = ; Ha: != P-Value = 0. 0095; Reject H 0 How can we use Linear Regression to duplicate these results? 10 October 2012

T-Test Results with Linear Regression in STATA Recall: ttest eq_orf, by (treat_phase) How can

T-Test Results with Linear Regression in STATA Recall: ttest eq_orf, by (treat_phase) How can we use Linear Regression to duplicate these results? – How many variables are in used in the ttest command? § eq_orf § treat_phase Use a linear regression model that only contains the two variables of interest. What would the STATA code for the model look like? 11 reg eq_orf i. treat_phase

T-Test Results with Linear Regression in STATA Recall: ttest eq_orf, by (treat_phase) reg eq_orf

T-Test Results with Linear Regression in STATA Recall: ttest eq_orf, by (treat_phase) reg eq_orf i. treat_phase Mean (N) 12 October 2013 48 wpm (913) 53 wpm (922) Difference (S. E. ) 4. 4 wpm (1. 7) T-Stat (DOF) 2. 59 (1833) H 0: = ; Ha: != P-Value = 0. 0095; Reject H 0

T-Test Results with Linear Regression in STATA Recall: ttest eq_orf, by (treat_phase) reg eq_orf

T-Test Results with Linear Regression in STATA Recall: ttest eq_orf, by (treat_phase) reg eq_orf i. treat_phase October 2012 October 2013. ! d e ait t h W g i e w(913) Mean (N) 48 ewpm 53 wpm (922) n u r a s t l u s he t e r t c e e s Difference (S. E. ) 4. 4 wpm (1. 7) l ef r The s t l u s e r r ? u e l o p e T-Stat (DOF) 2. 59 (1833) k m a s ma e e h t w t n o a n c n w e H 0 o = ; Ha: != tio P-Value = 0. 0095; Reject s HH 0: a l a u h p p _ po t a e r t. i f or _ q e g e r : svy 13

Linear Regression in STATA – Controlling for Other Variables Want to Know: Effect of

Linear Regression in STATA – Controlling for Other Variables Want to Know: Effect of certain variables when other variables we know to be influential are controlled. Recall: orf = 75 + 2. 7 I(female) – 3. 7 (age) In this model, we may already know that older students are less fluent readers because they are repeating the grade or have taken a long break between school years. But we want to know if gender influences fluency once age is controlled. § When do we use models with multiple variables? – § Determine Demographic and SSME Impact What variables must be in these models? Variables that we know strongly influence the outcome. – Sample design variables – § 14 Treatment; Gender; Time

Linear Regression in STATA – Controlling for Other Variables - Example § Fit a

Linear Regression in STATA – Controlling for Other Variables - Example § Fit a model for English fluency that accounts for treatment, time, gender, and formal/nonformal school type. Question of Interest: Once design variables are controlled for, is there a difference between students in formal and nonformal schools? STATA Code: svy: reg eq_orf i. treatment i. treat_phase i. treatment#i. treat_phase i. female i. nonformal Interpretation: Students in nonformal schools read on average 29 wpm more than students in formal schools when study design is controlled. 15

Linear Regression in STATA – Controlling for Other Variables - Activity: – Determine if

Linear Regression in STATA – Controlling for Other Variables - Activity: – Determine if any of the other SSME variables make a difference on student English reading fluency (eq_orf). 16

Linear vs Logistic Regression § When would you want to use logistic regression? –

Linear vs Logistic Regression § When would you want to use logistic regression? – Used to. . Model Binomial Categorical Data Examples: Zero Scores 0 = Score above Zero on Task 1 = Score equal Zero on Task § Reading Comprehension of 80% or Better 0 = Reading Comprehension Score < 80% 1 = Reading Comprehension Score >= 80% § – 17 Estimates. . . Probabilities and Odds Ratios

Linear vs Logistic Regression § 18

Linear vs Logistic Regression § 18

Linear vs Logistic Regression – Covariates & Odds Ratios § Covariates. . . §

Linear vs Logistic Regression – Covariates & Odds Ratios § Covariates. . . § Example: Connected to Odds Ratios Reading Comprehension 80%+ = -3 + 0. 76 I(Has English Book) § Odds Ratio: – § English Book vs. No English Book = exp(0. 76) = 2. 14 Interpretation: – On average students with English books will be 2 times more likely than students without English books to comprehend at least 80% of a connected text. 19

Linear vs Logistic Regression – Covariates & Probabilities § 20

Linear vs Logistic Regression – Covariates & Probabilities § 20

Logistic Regression Analysis – STATA Example Recall: Reading Comprehension 80%+ = -3 + 0.

Logistic Regression Analysis – STATA Example Recall: Reading Comprehension 80%+ = -3 + 0. 76 I(Has English Book) NOTE: code is very similar to linear regression STATA Code: svy: logistic eq_read_comp_score_pcnt 80 i. e_book, coef Why does e_book have an “i. ” in front of the variable name? – Why doesn’t eq_read_comp_score_pcnt 80 have an “i. ”? – – 21 What is the difference between the two lines of code?

More Information Susan Edwards Research Statistician 919. 316. 3541 SEdwards@RTI. org Sarrynna Sou Statistician

More Information Susan Edwards Research Statistician 919. 316. 3541 SEdwards@RTI. org Sarrynna Sou Statistician 919. 485. 2722 SSou@RTI. org