Using Artificial Neural Networks to Predict Malignancy of

  • Slides: 15
Download presentation
Using Artificial Neural Networks to Predict Malignancy of Ovarian Tumors C. Lu 1, J.

Using Artificial Neural Networks to Predict Malignancy of Ovarian Tumors C. Lu 1, J. De Brabanter 1, S. Van Huffel 1, I. Vergote 2, D. Timmerman 2 1 Department of Electrical Engineering, Katholieke Universiteit Leuven, Belgium, 2 Department of Obstetrics and Gynecology, University Hospitals Leuven, Belgium EMBC 2001

Overview n n n Introduction Data Exploration Input Selection Model Building Model Evaluation Conclusions

Overview n n n Introduction Data Exploration Input Selection Model Building Model Evaluation Conclusions EMBC 2001

Introduction n Problem n n ovarian masses: a common problem in gynecology. develop a

Introduction n Problem n n ovarian masses: a common problem in gynecology. develop a reliable diagnostic tool to discriminate preoperatively between benign and malignant tumors. assist clinicians in choosing the appropriate treatment. Data n n n Patient data collected at Univ. Hospitals Leuven, Belgium, 1994~1999 425 records, 25 features. 291 benign tumors, 134 (32%) malignant tumors. EMBC 2001

Introduction n Methods n ROC curves Data exploration: constructed by plotting the sensitivity versus

Introduction n Methods n ROC curves Data exploration: constructed by plotting the sensitivity versus the 1 n Data preprocessing, univariate analysis, PCA, factor specificity, or false positive analysis, discriminant analysis, logistic regression… rate, for varying probability cutoff level. n Modeling: n visualization of the n Logistic regression (LR) models relationship between n Artificial neural networks (ANN): MLP, RBF sensitivity and specificity of a test. n Performance measures: n Area under the ROC Receiver operating characteristic (ROC) analysis curves (AUC) n measures the probability of the classifier to correctly classify events and nonevents. EMBC 2001

Data exploration n Univariate analysis: n n Demographic, serum marker, color Doppler imaging preprocessing:

Data exploration n Univariate analysis: n n Demographic, serum marker, color Doppler imaging preprocessing: and morphologic variables descriptive statistics, histograms… EMBC 2001

Data exploration n Multivariate analysis: n n factor analysis biplots Fig. Biplot of Ovarian

Data exploration n Multivariate analysis: n n factor analysis biplots Fig. Biplot of Ovarian Tumor data. The observations are plotted as points (0=benign, 1=malignant), the variables are plotted as vectors from the origin. - visualization of the correlation between the variables - visualization of the relations between the variables and clusters. EMBC 2001

Input Selection n n Stepwise logistic regression analysis Searching in the feature space n

Input Selection n n Stepwise logistic regression analysis Searching in the feature space n n n fix several of the most significant variables, then vary combinations with the other predictive variables. different logistic regression models with different subsets of input variables were built and validated. subsets of variables were selected according to their predictive performance on the training set and test set. EMBC 2001

Model building n n Logistic regression (LR) model Artificial neural networks n n feed-forward

Model building n n Logistic regression (LR) model Artificial neural networks n n feed-forward neural networks, universal approximators: - multi-layer perceptron (MLP) - generalized regression network (GRNN) generalization capacity: central issue during network design and training. EMBC 2001

Model building - LR n Parameter estimation: - maximum likelihood - iterative procedure Fig.

Model building - LR n Parameter estimation: - maximum likelihood - iterative procedure Fig. Architecture of LRs for Predicting Malignancy of Ovarian Tumors F structure: LR 1: 8 -1 LR 2: 7 -1 EMBC 2001

Model Building - ANN - MLP n Training Bayesian regularization combined with Levenberg. Marquardt

Model Building - ANN - MLP n Training Bayesian regularization combined with Levenberg. Marquardt optimization. Fig. Architecture of MLPs for Predicting Malignancy of Ovarian Tumors F structure MLP 1: 8 -3 -1 MLP 2: 7 -3 -1 EMBC 2001

Model Building – ANN - GRNN n Training: GRNN is another term for Nadaraya-Watson

Model Building – ANN - GRNN n Training: GRNN is another term for Nadaraya-Watson kernel regression. No iterative training; the widths of RBF units h act as smoothing parameters, chosen by cross-validation. Fig. Architecture of GRNNs for Predicting Malignancy of Ovarian Tumors F structure GRN 1: 8 -N-1 GRN 2: 7 -N-1 EMBC 2001

Model Evaluation - Holdout CV AUC estimates and standard errors from hold out CV

Model Evaluation - Holdout CV AUC estimates and standard errors from hold out CV n Training set : data from the first treated 265 patients n Test set : data from the latest treated 160 patients • RMI: risk of malignancy index = scoremorph× scoremeno× CA 125 EMBC 2001

Model Evaluation - K-fold CV Box Expected plot of. ROC mean. AUC curves from

Model Evaluation - K-fold CV Box Expected plot of. ROC mean. AUC curves from 7 -fold k-fold. CV CV n stratified 7 -fold CV n for each run of 7 fold CV: nm. AUC : ( i. AUCi)/7, i =1, … 7, AUCi is the AUC on the ith validation set nexpected ROC: Averaging. n Repeat 7 -fold CV 30 times with different partitions => better statistical estimate EMBC 2001

Model Evaluation - K-fold CV n Multiple comparison of m. AUCs: one-way ANOVA followed

Model Evaluation - K-fold CV n Multiple comparison of m. AUCs: one-way ANOVA followed by Tukey multiple comparison. Rank ordered significant subgroups from multiple comparison on mean AUC Note: The subsets of adjacent means that are not significantly different at 95% confidence level are indicated by drawing a line under the subsets. EMBC 2001

Conclusions n Summary n n AUC is the advocated performance measure Data exploratory analysis

Conclusions n Summary n n AUC is the advocated performance measure Data exploratory analysis helps to analyze the data set. MLPs have the potential to give more reliable prediction. Future work n n Develop models with kernel methods, e. g. LS-SVM ANNs are blackbox models. A hybrid methodology, greybox models might be more promising EMBC 2001