Part 10 Advanced Topics Statistical Inference and Regression

  • Slides: 83
Download presentation
Part 10: Advanced Topics Statistical Inference and Regression Analysis: GB. 3302. 30 Professor William

Part 10: Advanced Topics Statistical Inference and Regression Analysis: GB. 3302. 30 Professor William Greene Stern School of Business IOMS Department of Economics

Part 10: Advanced Topics Statistics and Data Analysis Part 10 – Advanced Topics

Part 10: Advanced Topics Statistics and Data Analysis Part 10 – Advanced Topics

Part 10: Advanced Topics Advanced topics Nonlinear Least Squares ¢ Nonlinear Models – ML

Part 10: Advanced Topics Advanced topics Nonlinear Least Squares ¢ Nonlinear Models – ML Estimation ¢ Poisson Regression l Binary Choice l ¢ 3 End of course.

Part 10: Advanced Topics Statistics and Data Analysis Nonlinear Least Squares

Part 10: Advanced Topics Statistics and Data Analysis Nonlinear Least Squares

Part 10: Advanced Topics Nonlinear Least Squares

Part 10: Advanced Topics Nonlinear Least Squares

Part 10: Advanced Topics Lanczos 1 Data

Part 10: Advanced Topics Lanczos 1 Data

Part 10: Advanced Topics Nonlinear Regression

Part 10: Advanced Topics Nonlinear Regression

Part 10: Advanced Topics Nonlinear Least Squares There are no explicit solutions to these

Part 10: Advanced Topics Nonlinear Least Squares There are no explicit solutions to these equations in the form of bi = a function of (y, x).

Part 10: Advanced Topics Strategy for Nonlinear LS

Part 10: Advanced Topics Strategy for Nonlinear LS

Part 10: Advanced Topics NLS Strategy Pick b ¢ A. Compute yi 0 and

Part 10: Advanced Topics NLS Strategy Pick b ¢ A. Compute yi 0 and xi 0 ¢ B. Regress yi 0 on xi 0 ¢ This obtains a new b l Return to step A or exit if the new b is the same as the old b l

Part 10: Advanced Topics

Part 10: Advanced Topics

Part 10: Advanced Topics

Part 10: Advanced Topics

Part 10: Advanced Topics Lanczos 1 First Iteration Now, repeat the iteration using this

Part 10: Advanced Topics Lanczos 1 First Iteration Now, repeat the iteration using this as b

Part 10: Advanced Topics This is the correct answer

Part 10: Advanced Topics This is the correct answer

Part 10: Advanced Topics Gauss-Marquardt Algorithm Starting with b 0 ¢ A. Compute regressors

Part 10: Advanced Topics Gauss-Marquardt Algorithm Starting with b 0 ¢ A. Compute regressors xi 0 Compute residuals ei 0 = yi – f(xi, b 0) ¢ B. New b 1 = b 0 + slopes in regression of ei 0 on xi 0 ¢ Return to A. or exit if estimates have converged. ¢ This is equivalent to our earlier method. ¢

Part 10: Advanced Topics Statistics and Data Analysis Maximum Likelihood: Poisson

Part 10: Advanced Topics Statistics and Data Analysis Maximum Likelihood: Poisson

Part 10: Advanced Topics Application: Doctor Visits ¢ ¢ German Individual Health Care data:

Part 10: Advanced Topics Application: Doctor Visits ¢ ¢ German Individual Health Care data: N=27, 236 Model for number of visits to the doctor: l Poisson regression l Age, Health Satisfaction, Marital Status, Income, Kids

Part 10: Advanced Topics Poisson Regression

Part 10: Advanced Topics Poisson Regression

Part 10: Advanced Topics Nonlinear Least Squares

Part 10: Advanced Topics Nonlinear Least Squares

Part 10: Advanced Topics Maximum Likelihood Estimation This defines a class of estimators based

Part 10: Advanced Topics Maximum Likelihood Estimation This defines a class of estimators based on the particular distribution assumed to have generated the observed random variable. The main advantage of ML estimators is that among all Consistent Asymptotically Normal Estimators, MLEs have optimal asymptotic properties.

Part 10: Advanced Topics Setting up the MLE The distribution of the observed random

Part 10: Advanced Topics Setting up the MLE The distribution of the observed random variable is written as a function of the parameters to be estimated P(yi|data, β) = Probability density | parameters. The likelihood function is constructed from the density Construction: Joint probability density function of the observed sample of data – generally the product when the data are a random sample.

Part 10: Advanced Topics Likelihood for the Poisson Regression

Part 10: Advanced Topics Likelihood for the Poisson Regression

Part 10: Advanced Topics Newton’s Method

Part 10: Advanced Topics Newton’s Method

Part 10: Advanced Topics

Part 10: Advanced Topics

Part 10: Advanced Topics Properties of the MLE ¢ ¢ Consistent: Not necessarily unbiased,

Part 10: Advanced Topics Properties of the MLE ¢ ¢ Consistent: Not necessarily unbiased, however Asymptotically normally distributed: Proof based on central limit theorems Asymptotically efficient: Among the possible estimators that are consistent and asymptotically normally distributed Invariant: The MLE of g( ) is g(the MLE of )

Part 10: Advanced Topics Computing the Asymptotic Variance We want to estimate {-E[H]}-1 Three

Part 10: Advanced Topics Computing the Asymptotic Variance We want to estimate {-E[H]}-1 Three ways: (1) Just compute the negative of the actual second derivatives matrix and invert it. (2) Insert the maximum likelihood estimates into the known expected values of the second derivatives matrix. Sometimes (1) and (2) give the same answer (for example, in the Poisson regression model). (3) Since E[H] is the variance of the first derivatives, estimate this with the sample variance (i. e. , mean square) of the first derivatives. This will almost always be different from (1) and (2). Since they are estimating the same thing, in large samples, all three will give the same answer.

Part 10: Advanced Topics Poisson Regression Iterations

Part 10: Advanced Topics Poisson Regression Iterations

Part 10: Advanced Topics MLE NLS

Part 10: Advanced Topics MLE NLS

Part 10: Advanced Topics Using the Model. Partial Effects

Part 10: Advanced Topics Using the Model. Partial Effects

Part 10: Advanced Topics Effect of Income Depends on Age

Part 10: Advanced Topics Effect of Income Depends on Age

Part 10: Advanced Topics Effect of Income | Age

Part 10: Advanced Topics Effect of Income | Age

Part 10: Advanced Topics Statistics and Data Analysis Binary Choice

Part 10: Advanced Topics Statistics and Data Analysis Binary Choice

Part 10: Advanced Topics Case Study: Credit Modeling ¢ 1992 American Express analysis of

Part 10: Advanced Topics Case Study: Credit Modeling ¢ 1992 American Express analysis of l l Application process: Acceptance or rejection; Y = 0 (reject) or 1 (accept). Cardholder behavior • Loan default (D = 0 or 1). • Average monthly expenditure (E = $/month) • General credit usage/behavior (C = number of charges) ¢ 13, 444 applications in November, 1992

Part 10: Advanced Topics Proportion for Bernoulli In the Am. Ex data, the true

Part 10: Advanced Topics Proportion for Bernoulli In the Am. Ex data, the true population acceptance rate is 0. 7809 = ¢ Y = 1 if application accepted, 0 if not. ¢ E[y] = ¢ = paccept = . ¢ This is the estimator ¢ E[(1/N)Σiyi] 34

Part 10: Advanced Topics Some Evidence = Homeowners Does the acceptance rate depend on

Part 10: Advanced Topics Some Evidence = Homeowners Does the acceptance rate depend on home ownership?

Part 10: Advanced Topics A Test of Independence ¢ ¢ In the credit card

Part 10: Advanced Topics A Test of Independence ¢ ¢ In the credit card example, are Own/Rent and Accept/Reject independent? Hypothesis: Prob(Ownership) and Prob(Acceptance) are independent Formal hypothesis, based only on the laws of probability: Prob(Own, Accept) = Prob(Own)Prob(Accept) (and likewise for the other three possibilities. Rejection region: Joint frequencies that do not look like the products of the marginal frequencies.

Part 10: Advanced Topics Contingency Table Analysis The Data: Frequencies Reject Accept Total Rent

Part 10: Advanced Topics Contingency Table Analysis The Data: Frequencies Reject Accept Total Rent 1, 845 5, 469 7, 214 Own 1, 100 5, 030 6, 630 Total 2, 945 10, 499 13, 444 Step 1: Convert to Actual Proportions Reject Accept Total Rent 0. 13724 0. 40680 0. 54404 Own 0. 08182 0. 37414 0. 45596 Total 0. 21906 0. 78094 1. 00000

Part 10: Advanced Topics Independence Test Step 2: Expected proportions assuming independence: If the

Part 10: Advanced Topics Independence Test Step 2: Expected proportions assuming independence: If the factors are independent, then the joint proportions should equal the product of the marginal proportions. [Rent, Reject] [Rent, Accept] [Own, Reject] [Own, Accept] 0. 54404 x 0. 21906 0. 54404 x 0. 78094 0. 45596 x 0. 21906 0. 45596 x 0. 78094 = = 0. 11918 0. 42486 0. 09988 0. 35606

Part 10: Advanced Topics Comparing Actual to Expected It appears that the acceptance rate

Part 10: Advanced Topics Comparing Actual to Expected It appears that the acceptance rate is dependent on home ownership

Part 10: Advanced Topics When is the Chi Squared Large? Critical values from chi

Part 10: Advanced Topics When is the Chi Squared Large? Critical values from chi squared table ¢ Degrees of freedom = (R-1)(C-1). ¢ Critical chi squared D. F. . 05. 01 1 3. 84 6. 63 2 5. 99 9. 21 3 7. 81 11. 34 4 9. 49 13. 28 5 11. 07 15. 09 6 12. 59 16. 81 7 14. 07 18. 48 8 15. 51 20. 09 9 16. 92 21. 67 10 18. 31 23. 21

Part 10: Advanced Topics Analyzing Default ¢ ¢ Do renters default more often (at

Part 10: Advanced Topics Analyzing Default ¢ ¢ Do renters default more often (at a different rate) than owners? To investigate, we study the cardholders (only) OWNRENT 0 DEFAULT 0 1 All 4854 615 5469 46. 23 5. 86 52. 09 1 4649 44. 28 381 3. 63 5030 47. 91 All 9503 90. 51 996 9. 49 10499 100. 00

Part 10: Advanced Topics Hypothesis Test

Part 10: Advanced Topics Hypothesis Test

Part 10: Advanced Topics Central Proposition: A Utility Based Approach ¢ ¢ Observed outcomes

Part 10: Advanced Topics Central Proposition: A Utility Based Approach ¢ ¢ Observed outcomes partially reveal underlying preferences There exists an underlying preference scale defined over alternatives, U*(choices) Revelation of preferences between two choices labeled 0 and 1 reveals the ranking of the underlying utility l U*(choice 1) > U*(choice 0) Choose 1 l U*(choice 1) < U*(choice 0) Choose 0 Net utility = U*(choice 1) - U*(choice 0). U > 0 => choice 1

Part 10: Advanced Topics Binary Outcome: Visit Doctor In the 1984 year of the

Part 10: Advanced Topics Binary Outcome: Visit Doctor In the 1984 year of the GSOEP, 1611 of 3874 individuals visited the doctor at least once.

Part 10: Advanced Topics More Formal Model of Acceptance and Default

Part 10: Advanced Topics More Formal Model of Acceptance and Default

Part 10: Advanced Topics Probability Models zi

Part 10: Advanced Topics Probability Models zi

Part 10: Advanced Topics Likelihood Function

Part 10: Advanced Topics Likelihood Function

Part 10: Advanced Topics American Express, 1992

Part 10: Advanced Topics American Express, 1992

Part 10: Advanced Topics Logistic Model for Acceptance

Part 10: Advanced Topics Logistic Model for Acceptance

Part 10: Advanced Topics Probit Default Model

Part 10: Advanced Topics Probit Default Model

Part 10: Advanced Topics

Part 10: Advanced Topics

Part 10: Advanced Topics Ordered Discrete Outcomes ¢ ¢ E. g. : Taste test,

Part 10: Advanced Topics Ordered Discrete Outcomes ¢ ¢ E. g. : Taste test, credit rating, course grade, preference scale Underlying random preferences: l l ¢ ¢ ¢ Existence of an underlying continuous preference scale Mapping to observed choices Strength of preferences is reflected in the discrete outcome Censoring and discrete measurement The nature of ordered data

Part 10: Advanced Topics Ordered Choices at IMDb

Part 10: Advanced Topics Ordered Choices at IMDb

Part 10: Advanced Topics

Part 10: Advanced Topics

Part 10: Advanced Topics

Part 10: Advanced Topics

Part 10: Advanced Topics Health Satisfaction (HSAT) Self administered survey: Health Care Satisfaction (0

Part 10: Advanced Topics Health Satisfaction (HSAT) Self administered survey: Health Care Satisfaction (0 – 10) Continuous Preference Scale

Part 10: Advanced Topics Dueling Selection Biases – From two emails, same day. ¢

Part 10: Advanced Topics Dueling Selection Biases – From two emails, same day. ¢ ¢ “I am trying to find methods which can deal with data that is nonrandomised and suffers from selection bias. ” bias “I explain the probability of answering questions using, among other independent variables, a variable which measures knowledge breadth. Knowledge breadth can be constructed only for those individuals that fill in a skill description in the company intranet. This is where the selection bias comes from.

Part 10: Advanced Topics The Crucial Element ¢ ¢ Selection on the unobservables l

Part 10: Advanced Topics The Crucial Element ¢ ¢ Selection on the unobservables l Selection into the sample is based on both observables and unobservables l All the observables are accounted for l Unobservables in the selection rule also appear in the model of interest (or are correlated with unobservables in the model of interest) “Selection Bias”=the bias due to not accounting for the unobservables that link the equations.

Part 10: Advanced Topics Canonical Sample Selection Model

Part 10: Advanced Topics Canonical Sample Selection Model

Part 10: Advanced Topics Applications ¢ Labor Supply model: l l ¢ ¢ ¢

Part 10: Advanced Topics Applications ¢ Labor Supply model: l l ¢ ¢ ¢ y*=wage-reservation wage d=labor force participation Attrition model: Clinical studies of medicines Survival bias in financial data Income studies – value of a college application Treatment effects Any survey data in which respondents self select to report Etc…

Part 10: Advanced Topics Estimation of the Selection Model ¢ ¢ Two step least

Part 10: Advanced Topics Estimation of the Selection Model ¢ ¢ Two step least squares l Inefficient l Simple – exists in current software l Simple to understand widely used Full information maximum likelihood l Efficient l Simple – exists in current software l Not so simple to understand – widely misunderstood

Part 10: Advanced Topics Heckman’s Model

Part 10: Advanced Topics Heckman’s Model

Part 10: Advanced Topics Two Step Estimation The “LAMBDA”

Part 10: Advanced Topics Two Step Estimation The “LAMBDA”

Part 10: Advanced Topics Classic Application ¢ ¢ Mroz, T. , Married women’s labor

Part 10: Advanced Topics Classic Application ¢ ¢ Mroz, T. , Married women’s labor supply, Econometrica, 1987. l N =753 l N 1 = 428 A (my) specification l LFP=f(age, age 2, family income, education, kids) l Wage=g(experience, exp 2, education, city)

Part 10: Advanced Topics Selection Equation +-----------------------+ | Binomial Probit Model | | Dependent

Part 10: Advanced Topics Selection Equation +-----------------------+ | Binomial Probit Model | | Dependent variable LFP | | Number of observations 753 | | Log likelihood function -490. 8478 | +-----------------------+ +--------------+--------+--------+-----+ |Variable| Coefficient | Standard Error |b/St. Er. |P[|Z|>z]| Mean of X| +--------------+--------+--------+-----+Index function for probability Constant| -4. 15680692 1. 40208596 -2. 965. 0030 AGE |. 18539510. 06596666 2. 810. 0049 42. 5378486 AGESQ | -. 00242590. 00077354 -3. 136. 0017 1874. 54847 FAMINC |. 458045 D-05. 420642 D-05 1. 089. 2762 23080. 5950 WE |. 09818228. 02298412 4. 272. 0000 12. 2868526 KIDS | -. 44898674. 13091150 -3. 430. 0006. 69588313

Part 10: Advanced Topics Heckman Estimator

Part 10: Advanced Topics Heckman Estimator

Part 10: Advanced Topics TECHNICAL EFFICIENCY ANALYSIS CORRECTING FOR BIASES FROM OBSERVED AND UNOBSERVED

Part 10: Advanced Topics TECHNICAL EFFICIENCY ANALYSIS CORRECTING FOR BIASES FROM OBSERVED AND UNOBSERVED VARIABLES: AN APPLICATION TO A NATURAL RESOURCE MANAGEMENT PROJECT Empirical Economics: Volume 43, Issue 1 (2012), Pages 55 -72 Boris Bravo-Ureta University of Connecticut Daniel Solis University of Miami William Greene New York University

Part 10: Advanced Topics The MARENA Program in Honduras Several programs have been implemented

Part 10: Advanced Topics The MARENA Program in Honduras Several programs have been implemented to address resource degradation while also seeking to improve productivity, managerial performance and reduce poverty (and in some cases make up for lack of public support). One such effort is the Programa Multifase de Manejo de Recursos Naturales en Cuencas Prioritarias or MARENA in Honduras focusing on small scale hillside farmers.

Part 10: Advanced Topics OVERALL CONCEPTUAL FRAMEWORK MARENA Training & Financing Natural, Human &

Part 10: Advanced Topics OVERALL CONCEPTUAL FRAMEWORK MARENA Training & Financing Natural, Human & Social Capital Off-Farm Income More Production and Productivity More Farm Income Sustainability Working HYPOTHESIS: if farmers receive private benefits (higher income) from project activities (e. g. , training, financing) then adoption is likely to be sustainable and to generate positive externalities.

Part 10: Advanced Topics The MARENA Program COMPONENT I: Strengthening Strategic Management Capabilities among

Part 10: Advanced Topics The MARENA Program COMPONENT I: Strengthening Strategic Management Capabilities among Govt. Institutions (central and local) COMPONENT II: Support to Nat. Res. Management. Projects Module 1: Promotion and Organization Modulo 2: Strengthening Local Institutions & Organizations Module 3: Investment (farm, municipal & regional) COMPONENT III: Administration and Supervision

Part 10: Advanced Topics Component II - Module 3 focused on promoting investments in

Part 10: Advanced Topics Component II - Module 3 focused on promoting investments in sustainable production systems with a budget of US $7. 6 million (Bravo-Ureta, 2009). The major activities undertaken with beneficiaries: training in business management and sustainable farming practices; and the provision of funds to co-finance investment activities through local rural savings associations (cajas rurales).

Part 10: Advanced Topics Expected Impact Evaluation

Part 10: Advanced Topics Expected Impact Evaluation

Part 10: Advanced Topics Cornwell and Rupert Data Cornwell and Rupert Returns to Schooling

Part 10: Advanced Topics Cornwell and Rupert Data Cornwell and Rupert Returns to Schooling Data, 595 Individuals, 7 Years Variables in the file are EXP WKS OCC IND SOUTH SMSA MS FEM UNION ED LWAGE = = = work experience weeks worked occupation, 1 if blue collar, 1 if manufacturing industry 1 if resides in south 1 if resides in a city (SMSA) 1 if married 1 if female 1 if wage set by union contract years of education log of wage = dependent variable in regressions These data were analyzed in Cornwell, C. and Rupert, P. , "Efficient Estimation with Panel Data: An Empirical Comparison of Instrumental Variable Estimators, " Journal of Applied Econometrics, 3, 1988, pp. 149 -155. See Baltagi, page 122 for further analysis. The data were downloaded from the website for Baltagi's text.

Part 10: Advanced Topics Specification

Part 10: Advanced Topics Specification

Part 10: Advanced Topics The Effect of Education on LWAGE

Part 10: Advanced Topics The Effect of Education on LWAGE

Part 10: Advanced Topics What Influences LWAGE?

Part 10: Advanced Topics What Influences LWAGE?

Part 10: Advanced Topics An Exogenous Influence

Part 10: Advanced Topics An Exogenous Influence

Part 10: Advanced Topics Instrumental Variables ¢ Structure LWAGE (ED, EXPSQ, WKS, OCC, SOUTH,

Part 10: Advanced Topics Instrumental Variables ¢ Structure LWAGE (ED, EXPSQ, WKS, OCC, SOUTH, SMSA, UNION) l ED (MS, FEM) l l Reduced Form: LWAGE[ ED (MS, FEM), EXPSQ, WKS, OCC, SOUTH, SMSA, UNION ]

Part 10: Advanced Topics Two Stage Least Squares Strategy l ¢ Reduced Form: LWAGE[

Part 10: Advanced Topics Two Stage Least Squares Strategy l ¢ Reduced Form: LWAGE[ ED (MS, FEM, X), EXPSQ, WKS, OCC, SOUTH, SMSA, UNION ] Strategy l (1) Purge ED of the influence of everything but MS, FEM (and the other variables). Predict ED using all exogenous information in the sample (X and Z). l (2) Regress LWAGE on this prediction of ED and everything else. l Standard errors must be adjusted for the predicted ED

Part 10: Advanced Topics

Part 10: Advanced Topics

Part 10: Advanced Topics

Part 10: Advanced Topics

Part 10: Advanced Topics

Part 10: Advanced Topics

Part 10: Advanced Topics

Part 10: Advanced Topics