What is Regression Analysis Regression analysis is used

  • Slides: 16
Download presentation
What is Regression Analysis? Regression analysis is used to model the relationship between a

What is Regression Analysis? Regression analysis is used to model the relationship between a dependent variable and one or more independent variables.

Linear Regression

Linear Regression

Polynomial Regression red curve fits the data better than the green curve= situations where

Polynomial Regression red curve fits the data better than the green curve= situations where the relation. between the dependent and independent variable seems to be non-linear we can deploy Polynomial Regression Models.

Quantile (percentile) Regression • generally use it when outliers, high skeweness and heteroscedasticity exist

Quantile (percentile) Regression • generally use it when outliers, high skeweness and heteroscedasticity exist in the data. • aims to estimate either the conditional median or other quantiles of the response variable • we try to estimate the quantile of the dependent variable given the values of X’s.

Logistic Regression • dependent variable is binary • y follows binomial distribution and hence

Logistic Regression • dependent variable is binary • y follows binomial distribution and hence is not normal • the error terms are not normally distributed.

Cox Regression (survival analysis; proportional hazards model) • investigating the effect of several variables

Cox Regression (survival analysis; proportional hazards model) • investigating the effect of several variables upon the time a specified event takes to happen • time-to-event data e. g Time from first heart attack to the second • Dual targets are set for the survival model 1. A continuous variable representing the time to event. 2. A binary variable representing the status whether event occurred or not.

Ordinal Regression • dependent variable is ordinal-ranked values • Example of ordinal variables –

Ordinal Regression • dependent variable is ordinal-ranked values • Example of ordinal variables – Survey responses (1 to 6 scale), patient reaction to drug dose (none, mild, severe). • Ordinal regression can be performed using a generalized linear model (GLM) that fits both a coefficient vector and a set of thresholds to a dataset.

Poisson Regression (log-linear model) • Dependent variable is count data • The dependent variable

Poisson Regression (log-linear model) • Dependent variable is count data • The dependent variable must meet the following conditions: 1) The dependent variable has a Poisson distribution. 2) Counts cannot be negative. 3)This method is not suitable on non-whole numbers

Negative Binomial Regression • deals with count data • does not assume distribution of

Negative Binomial Regression • deals with count data • does not assume distribution of count having variance equal to its mean • Deals with overdispersion

Quasi Poisson Regression • alternative to negative binomial regression • used for overdispersed count

Quasi Poisson Regression • alternative to negative binomial regression • used for overdispersed count data • Both the algorithms give similar results, there are differences in estimating the effects of covariates • variance of a quasi-Poisson model is a linear function of the mean while the variance of a negative binomial model is a quadratic function of the mean. • can handle both over-dispersion and under-dispersion

Principal Components Regression (PCR) • based on principal component analysis (PCA). • calculate the

Principal Components Regression (PCR) • based on principal component analysis (PCA). • calculate the principal components and then use some of these components as predictors in a linear regression model fitted using the typical least squares procedure • Dimensionality Reduction & Removal of multicollinearity

Partial Least Squares (PLS) Regression • alternative technique of principal component regression when you

Partial Least Squares (PLS) Regression • alternative technique of principal component regression when you have independent variables highly correlated. It is also useful when there a large number of independent variables. • finds a linear regression model by projecting the predicted variables and the observable variables to a new space.

Ridge Regression • technique for analyzing multiple regression data that suffer from multicollinearity. •

Ridge Regression • technique for analyzing multiple regression data that suffer from multicollinearity. • Regularization parameter • When multicollinearity occurs, least squares estimates are unbiased, but their variances are large so they may be far from the true value.

Lasso Regression Least Absolute Shrinkage and Selection Operator • performs both variable selection and

Lasso Regression Least Absolute Shrinkage and Selection Operator • performs both variable selection and regularization in order to enhance the prediction accuracy and interpretability of the statistical model it produces. • L 1 regularization technique: minimize the objective function by adding a penalty term to the sum of the absolute values of coefficients.

Elastic Net Regression • A regularized regression method that linearly combines the L 1

Elastic Net Regression • A regularized regression method that linearly combines the L 1 and L 2 penalties of the lasso and ridge methods. • Elastic Net regression is preferred over both ridge and lasso regression when one is dealing with highly correlated independent variables.

Support Vector Regression/ Machine • can solve both linear and nonlinear models • Non-parametric

Support Vector Regression/ Machine • can solve both linear and nonlinear models • Non-parametric • uses non-linear kernel functions (such as polynomial) to find the optimal solution for non-linear models