Academia Sinica June 15 2014 Taipei Symbolic tree
Academia Sinica June. 15 2014, Taipei Symbolic tree for prognosis of Hepato Cellular Carcinoma
Symbolic Tree for Prognosis of Hepato Cellular Carcinoma June. 15 2014, Taipei Taerim Lee(1) Hyosuk Lee(2) Edwin Diday(3) (1) Korea National Open University trlee@knou. ac. kr (2) Department of Internal Medicine, SNU Hospital (3) University of Paris 9 Dauphine France diday@ceremade. dauphine. fr
Academia Sinica Outline 1. Review of Literature 2. Motivation 3. Tree structures Classification Model for HCC 4. Symbolic Data Analysis for HCC 5. Remarks Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica Motivation 1. To develop the powerful modeling technique for exploring the functional form of covariate effects for prognosis of HCC patients 2. To obtain the tree structured prognostic models for HCC with time covariate 3. To extract new knowledge from a HCC data using Symbolic Data Analysis Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica Purposes 1. To identify the effect of prognostic factors of HCC. 2. To quantify the patient characteristics that related to the high risk clinical factor. 3. To explore the functional form of the relationships of the covariates. 4. To extract new knowledge and fit symbolic tree model Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica Previous Work Breiman, L. , Friedman, J. H. , Olshen, R. A. , Stone, C. J. (1984) developed Classification and regression tree, CART L. Gorden & R. Olshen (1985) presented tree structured survival analysis in the Cancer. Treatment Reports Ciampi. Thiffault, Nakache & Asselain (1986) proposed a variety of splitting criteria such as likelihood ratio statistics based on the exponential model or the Cox partial likelihood, Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica Previous Work M. Le. Blanc & John Crowley (1992) developed a method for obtaining treestructured relative risk estimate using the logrank statistic for splitting and need between node dissimilarity in a puonning algorithm. H. Ahn & W. Y. Loh (1994) yields a piece wise-linear Cox proportional hazard model using curvature detection tests rather than exhaustive serach which evaluate all possible splits in finding splits to reduce computing time. W. Y. Loh & Y. S shin (1997) derived split selection methods for classification tree in Statistica Sinica. Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica Previous Work T. R Lee, H. S Moon(1994) Prediction Model of craniofacial growth-dental arch classification of 6 and 7 year old children-, The Journal of Korea Society of Dental Health, vol 21, no. 3 T. R Lee(1998) Classification Model for High Risk Dental Caries with RBF Neural Networks, , The Journal of Data Science and Classification, vol. 2 (2) T. R Lee et al (2006) Independent Prognostic factors of 861 cases of oral squamous cell carcinoma in korean adults, Oral Oncology, vol. 42, p 208 -217 Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica Previous Work Bock, H. H, Diday E (2000) Analysis of symbolic Data. Exploratory methods for extracting statistical Information from complex data. Springer Verlag, Heidelberg Bravo Liatas, M. C (2000) Strata decision tree sysmbolic data analysis software , Data analysis, classification and related methods, Springer Verlag, p 409 -415 T. R Lee(2009) Tree Structured Prognostic Model for Hepatocellular Carcinoma, Journal of Korea Health Inormation & Statistics, Vol. 28 No. 1, 2009. T. R Lee (2011) Survival tree for Hepato Cellular Carcinoma patient, Journal of Korean Society of Public Health Information & Statistics Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica Previous Work V. Patel, S. Leethanakul (2001) reported new approaches to the understanding of the molecular basis of oral cancer. Billard L, Diday E(2003) looks at the concept of SDA in general, and attempt to review the methods available to analyze such data. ‘From the statistics of Data to the Statistics of knowledge’ Mballo C. , Diday E. (2005) compare the Kolmogorov Simirnov criterion and Gini index for test selection metric for decision tree induction Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica Tree Structured Classification Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica Tree Model • The tree structured classification modeling constructs classification rules based on the information provided in a learning sample of objects with known identities. total L X 2 >b X 1 >a D X 3>c D X 4 >d L L Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica Logistic Regression Model By the stepwise Logistic Regression Analysis(LRA), four variables, were used to construct the logistic regression model. • The Model which involves is as follows ; • Log Likelihood = 611. 989, p = 0. 0004, • Goodness of fit chi-sq = 569. 34, p = 0. 02. Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica Schematic comparison of a classification tree and logistic regression equation for risk assessment 0 Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica CART H: High risk L: Low risk total H X 2 >b X 1 >a X 3>c L H X 4 >d L L tree structured prognostic model with effective covariate : CART uses a decision tree to display how data may be classified or predicted. : automatically searches for important relationships and uncovers hidden structure even in highly complex data. Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica FACT H: high risk L: low risk total L X 2 >b X 1 >a X 3>c H X 4 >d H L L tree structured prognostic model with effective covariate : FACT employs statistical hypothesis test to select a variable for splitting each node and then uses discriminant analysis to find the split point. The size of the tree is determined by a set of rules Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica QUEST D: death L: live total L X 2 >b X 4+2 X 1 >a D X 3>c X 4 >d L D L : QUEST is a new classification tree algorithm derived from the FACT method. It can be used with univariate splits or linear combination splits. Unlike FACT, QUEST uses cross-validation pruning. It distinguishes from other decision tree classifiers is that when used with univariate splits the classifier performs approximately unbiased variable selection. Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica DATA Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica Classification Tree Model H: High Risk group L: Low Risk group Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica CART 94 46(0) 48(1) 84 37(0) 47(1) CHILD≤ 5. 5 TAENUM≤ 1. 5 49 15(0) 34(1) 46 12(0) 34(1) 1 INV≤ 0. 5 35 22(0) 13(1) 10 9(0) 1(1) 0 3 18 SIZE≤ 3. 85 17 3(0) 8(0) 14(0) 0(1) 10(1) 3(1) 0 AFP≤ 10. 4 0 8 10 7(0) 1(1) 3(1) 1 0 Sensitivity 71. 7% Specificity 85. 4% Total 78. 7% 1. TAENUM 100. 0 2. AFP 87. 7 3. CHILD 72. 3 4. SIZE 59. 4 5. INV 59. 0 6. CLIP 45. 5 Fig. 4 Tree Structured Model for TACE group of HCC data Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica RBF Neural Network Classification Block diagram representation of nervous system Stimulus Receptors Neural net Effectors Symbolic tree for prognosis of Hepato Cellular Carcinoma Response
Academia Sinica RBF NN ROC curve according to the Radial Basis Function Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica Classification results Kernel V 16 , V 17, V 19 66. 3 64. 2 Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica Survival Tree Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica Survival Data. The response var ; survival time - The length of time; a patient has survived after diagnosis. Censoring is common since the endpoint may not be observed because of termination of a study or failure to follow up Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica Cox proportional Hazard Model. Data (Yi, i, xi) where Yi is the minimum of failure time Zi and a censoring time Ci i = I (Zi Ci) is an indicator of the event that a failure is observed. Xi=(X 1 i …Xpi ) is a p dimensional column vector of covariates. Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica Cox Proportional Hazard Model Let (t|x) be the hazard rate at time y for an individual with risk factor X Cox proportional hazard model; Where are unknow parameters 0(y) is the baseline hazard rate at time y. Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica STUDI L total X 2 >b X 1 >a X 3>c S X 4 >d S L S: short term survive L: long term survive L Survival Tree with Unbiased Detection of Interaction : STUDI is a tree-structured regression modeling tool. It is easy to interpret predict survival value for new case. Missing values can easily be handled and time dependent covariates can be incorporated. Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica Split Covariate Selection 1. Fit a model to n and f covariates in the node. 2. Obtain the modified Cox-Snell residuals. 3. Perform a curvature test for each of n-s-and c-covariates. 4. Perform a interaction test for each pair of n s-and c-covariates. 5. Select the covariate which has the smallest p-value. Symbolic tree for prognosis of Hepato Cellular Carcinoma 29
Academia Sinica STUDI Survival Tree with Unbiased Detection of Interaction Cho & Loh(2001) - STUDI is tree structured regression modeling tool. - It is easy to interpret predict survival value for new case. - Missing value can easily be handled and time dependent covariates can be incorporated. Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica STUDI Let the survival function for a covariate Xi be where rate. is the cumulative baseline hazard Then median survival time for an individual i is defined as and the cost at a node t be is defined as Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica Tree Structured Survival Model STUDI Modified Cox-Snell(MCS) residuals; for where is the estimator of the cumulative baseline hazard function. Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica Fig 4. Scatter plot of Box plot of the MCS Residuals 33 Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica Fig. 11 Tree Structured Survival Model with SNP and Clinical Data of HCC using imputed 252 missing data Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica Fig. 6 Tree structured Survival model for OSCC 2 Pstage=1, 2, 3 88 15 2. 42 E+02 txmethod=1, 2, 5 141 Radio ≤ 0. 00 E+00 5 Age ≤ 5. 20 E+01 73 4 size ≤ 1. 60 E+01 t=1, 4 10 20 19 40 40 1. 06 E+02 11 21 6 6. 30 E+01 size 22 ≤ 1. 00 E+00 24 41 44 9 6. 30 E+01 6 7. 30 E+01 3 Age ≤ 5. 80 E+01 6 25 13 10 9. 40 E+01 size ≤ 1. 04 E+01 15 1. 80 E+01 23 28 24 1. 00 E+01 8 8. 70 E+01 Site =10, 2, 3, 4, 5, 6, 7, 9 7 28 12 48 25 size ≤ 6. 77 E+00 Radio ≤ 5. 92 E+03 1 14 15 15 13 1. 57 E+02 29 7 7. 50 E+01 45 18 Site =10, 2, 3, 5, 6, 7, 9 180 6 6. 50 E+01 90 12 91 6 2. 60 E+01 181 6 3. 30 E+01 36 Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica 37
Academia Sinica SDA (Symbolic Data Analysis) 1. To generalize data mining and statistics to higher level units described by symbolic data 2. To extract new knowledge from a database by using a standard data table 3. Working on higher level units called concepts necessary described by more complex data extending data mining to knowledge mining Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica From data mining to knowledge mining 1. A SDA needs two level of units The first level : individual The second level : concepts 2. A Concept is described by using the description of class of individuals of its extent 3. The description of a concept must express the variation of the individuals of its extent 4. Output of SDA provide new symbolic objects associated with new categories, categories of concepts Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica SDA steps 1. Related database : composed of several more or less linked data 2. Define a set of categories based on the categorical variable from a quary to be given related database 3. The class of individuals which defines the extent of category 4. Generalize process is applied to the subset of individuals belonging to the extent of each concept 5. Define a symbolic data table 6. Symbolic Data Analysis Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica The main step for a SDA Put the Data in a relational Data Base Define a Context by Giving the Units & Classes Build a Symbolic Data Table Apply SDA tools: Decision tree, Clustering, Graphical visualization Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica SDA Advantage § Aggregated data representation § Confidentiality preservation § Data volume reduction Symbolic Object = intention (symbolic description + recognition function of the extension) + extension (individuals represented by the concept) Eg. [ sex~(man(0. 8), woman(0. 2))]^[region~{city, rural}]^ Salary~[1. 2, 3. 1] Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica Symbolic Object Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica SDA Schematic expression Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica SDA Input Symbolic Data Description of individual concepts Column symbolic variable Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica Symbolic Data Table Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica Symbolic Data variable Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica Input Symbolic Data 2 D Zoom Visualization Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica 3 D Zoom Stars Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica 2 D and 3 D Doom Stars Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica Fig 5. SDA results according to new defined concept of metastasis & prognosis Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica Symbolic Tree Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica Clustering Tree Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica Table 1. Patients Baseline characteristics of HCC Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica Table 2 Cox proportional hazards model for metastasis-free survival Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica Fig 1. Survival curve of metastasis free HCC Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica Fig 2. Survival curve of metastasis free HCC patient according to AJCC statge Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica Fig 3. Survival curve of metastasis free HCC patient according to the histologic response Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica Fig. 11 Tree Structured Survival Model with SNP and Clinical Data of HCC using imputed 252 missing data Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica Fig. 5 Characterization of the classes according to the evolution of HCC. individuals free of HCC and liver cirrhosis but b. CL (2 x 0 xb. CL), individuals free of HCC and diagnosis 3 and liver cirrhosis (3 x 1 xb. CL), individuals with diagnosis 4 and liver cirrhosis and acute HCC (4 x 1 xa. HCC), and individuals with diagnosis 6 and acute HCC and free of liver cirrhosis (6 x 0 xa. HCC). Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica Fig. 5 Comparison of the Partition free of HCC class. groups the 12 clinical variables and 3 gene data with the lowest frequency of degradation (3 x 1 xb. CL), against Partition HCC class that contains the larger variance with the highest frequency of degradation (4 x 1 xa. HCC). Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica Fig. 6 The most discriminating variables to influence HCC prognosis. Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica Fig. 6 The most discriminating variables to influence HCC prognosis. § INPUT: the symbolic data table with three concepts § OUTPUT: a symbolic data table with three rows associated to the three concepts: The first column represents the frequencies of missing data ". " the others represent the frequencies of LC = 0 and the last row the frequencies of LC = 1 Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica Fig. 7 Maps of the distribution in the categories of "Encephalothy", "Ascites", on the first factorial plane. Data with the lowest frequency of degradation (3 x 1 xb. CL), against Partition HCC class that contains the larger variance with the highest frequency of degradation (4 x 1 xa. HCC). Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica Fig. 8 Correlation circle between the first more discriminating symbolic variables and some of their bins The plane and symbolic variables in the smallest square which contains the first quadrant of the circle of correlation. The Figure 8 shows the distribution of the categories of the weight. From the representation of the weight on the left, it can be seen that the class (3 x 1 xb. CL), i. e. individuals with HCC at baseline and at the end of the study, are the lightest individuals, and from the representation of the encephalothy on the right, we can see that the class 3 x 1 xb. C of degradation has greater ascites well. Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica Fig. 9 Correlation circle the whole gene variable and bins of clinical variables Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica Symbolic Tree for HCC § INPUT : Patient data table § OUTPUT : Decision tree and rules § Notice that Symbolic TREE is better adapted working on concepts than on individuals. Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica Fig. 10 Symbolic Tree for HCC Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica Table 3 List of the most characteristic bins of 3 x 1 xb. CL and 4 x 1 xa. HCC Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica Remarks 1. The application of tree structured classification gave easy interpretable method with small number predictor variables. 2. The application of SDA results in more detail information and symbolic description of classes. 3. SDA gave more practical information with visualization graph and diagram. Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica Q&A
Academia Sinica Thank you ! �� !
- Slides: 75