A Multivariate Regression Analysis of Hospital Stays in

  • Slides: 17
Download presentation
A Multivariate Regression Analysis of Hospital Stays in a Nosocomial Infection Control Data Principal

A Multivariate Regression Analysis of Hospital Stays in a Nosocomial Infection Control Data Principal Investigator: Dr. Linda B. Hayden Mentor: Dr. Julian A. Allagan Team Members: Matthew Hill, Jessica Hathaway, Lilshay Rogers, Heaven Tate

Abstract In this report we developed analyzed several linear regression models to predict hospital

Abstract In this report we developed analyzed several linear regression models to predict hospital stays (or length of Stay) of patients in the U. S using the SENIC project data from CDC-Atlanta. We examined several potential exploratory variables and their relations with the response variable “Stay”, with the goal of determining what leading factors influenced the length of stay of patients in this Nosocomial (hospital acquired) infection control data. In particular, our report aimed at answering the following: given the data, what leading factors help explain the hospital stays of patients in U. S? In at least one model, we found that Age and Regions influenced the variable “Stay” the most.

Introduction • In 2011 hospital in-patient expenses accounted for almost one -third of all

Introduction • In 2011 hospital in-patient expenses accounted for almost one -third of all healthcare expenditures compared to prescription medicine which accounted for about one-fifth of total medical expenses in the United States. • In 2012, there were 36. 5 million hospital stays in the US with an average length of stay of 4. 5 days and with an average cost of $10, 400 per stay. • Using some data exploration and linear regression analysis tools, we determine some association between hospital stays and several factors.

Data • 1 file: ‘Hospital. txt’ 8 KB • 12 variables from 113 US

Data • 1 file: ‘Hospital. txt’ 8 KB • 12 variables from 113 US hospital records • SENIC data-CDC Atlanta üAge (Age) üInfection Risk (Risk) üRoutine Culturing Ratio (Culturing) üNumber of Beds (Bed) üRegions (Region) üNumber of Nurses (Nurse)

Variable Distributions Variables Min 1 st. Q Median Mean 3 rd. Q Max Stay

Variable Distributions Variables Min 1 st. Q Median Mean 3 rd. Q Max Stay (days) 11 8 9 10 11 20 Age (years) 39 51 53 53 56 66 Risk (percent) 1. 3 3. 7 4. 4 4. 3 5. 2 7. 8 Beds 29 106 186 252 312 835 Nurses 14 66 132 173 218 656

Hospital Stay Boxplot

Hospital Stay Boxplot

Distribution of Length of the response Stay

Distribution of Length of the response Stay

Analytical Process Stepwise selection. AIC 4 -6 predictors OLS with Length of Stay as

Analytical Process Stepwise selection. AIC 4 -6 predictors OLS with Length of Stay as response 11 explanatory variables (covariates)

Investigation: Leading factors in length of Stay Model A Risk Region Census Nurses Age

Investigation: Leading factors in length of Stay Model A Risk Region Census Nurses Age Model B Model C Risk Region Census Nurses Age Xray Beds

Model A The predictors help explain about 60% (R 2 = 0. 59) of

Model A The predictors help explain about 60% (R 2 = 0. 59) of the changes we observed in the average length of stay in this model. Moreover, each parameter is statistically significant (p-value<2. 2 e-16).

Model B The predictors help explain about 61% (R 2 = 0. 61) of

Model B The predictors help explain about 61% (R 2 = 0. 61) of the changes we observed in the average length of stay in this model. Moreover, each parameter is statistically significant (p-value<2. 2 e-16).

Model C The predictors help explain about 62% (R 2 = 0. 62) of

Model C The predictors help explain about 62% (R 2 = 0. 62) of the changes we observed in the average length of stay in this model. Moreover, each parameter is statistically significant (p-value<2. 2 e-16).

Model Building Process Summary Model A (R-sq=0. 59) Pool of Variables Model B (R-sq=0.

Model Building Process Summary Model A (R-sq=0. 59) Pool of Variables Model B (R-sq=0. 61) Model C (R-sq=0. 62) Model A Chosen All variables uncorrelated Check Model Assumptions Model A as “Best” Predictive Model

Regression Output For Our Final Model Coefficients Risk Region Census Nurses Age 0. 54

Regression Output For Our Final Model Coefficients Risk Region Census Nurses Age 0. 54 -0. 68 0. 01 -0. 01 0. 08 Standard Error 0. 10 0. 12 0. 00 0. 03 P-value 1. 54 e-07 1. 21 e-07 1. 18 e-06 0. 00058 0. 00456 All variables are statistically significant at a 95% confidence level, with very low coefficient estimates errors

Conclusion • We used a traditional or standard ordinary least square regression model on

Conclusion • We used a traditional or standard ordinary least square regression model on this data • Model indicates that variables such as Risk Regions and Nurses play bigger roles in affecting the Patients length of Stay. Age is almost irrelevant. • Recommendation: Find ways to lower the risk of infection, and perhaps increase the number of Nurses. • Future work: Instead of a Stepwise selection criterion, use a machine learning algorithm such as GBM (Gradient Boosting Machine) to find perhaps a list of different set of predictors from the data. • Future question: What factors contribute the most to the increase in risk of a nosocomial infection?

References 1 Gonzalez JM. National Health Care Expenses in the U. S. Civilian Noninstitutionalized

References 1 Gonzalez JM. National Health Care Expenses in the U. S. Civilian Noninstitutionalized Population, 2011. MEPS Statistical Brief No. 425. Rockville, MD: Agency for Healthcare Research and Quality, 2013. http: //meps. ahrq. gov/data_files/publications/st 425/stat 425. pdf 2 Weiss AJ (Truven Health Analytics), Elixhauser A (AHRQ). Overview of Hospital Stays in the United States, 2012. HCUP Statistical Brief #180. October 2014. Agency for Healthcare Research and Quality, Rockville, MD. http: //www. hcup-us. ahrq. gov/reports/statbriefs/sb 180 -Hospitalizations-United-States- 2012. pdf. 3 Special issue, The SENIC Project, ” American Journal of Epidemiology 111 (1980), pp. 465 -653. Data obtained from Robert W. Haley, M. D. Hospital Infections Program, Center for Infectious Disease, Center for Disease Control, Atlanta, Georgia 30333. 4 Kutner, Nachtsheim, Neter and Li, Applied Linear Statistical Methods 5 ed. , Mc. Graw-Hill, 2004.

Questions

Questions