Numerical Analysis EE NCKU TienHao Chang Darby Chang
Numerical Analysis EE, NCKU Tien-Hao Chang (Darby Chang) 1
Correlation coefficient Two continuous variables 2
Correlation coefficient (CC) n What we need is a single summary number that answers the following questions: – does a relationship exist? – if so, is it a positive or a negative relationship? and – is it a strong or a weak relationship? n Correlation coefficient, a single summary number that gives you a good idea about how closely one variable is related to another variable 3
Correlation coefficient Two-way scatter plot n 4
The mortality rate tends to decrease as the percentage of children immunized increase 5
Correlation Coefficient Pearson’s correlation coefficient n 6
n 7
8 http: //upload. wikimedia. org/wikipedia/commons/0/02/Correlation_examples. png
Correlation Coefficient Correlation coefficient is not a percent n 9
Correlation Coefficient of determination n 10
11
Statistical test 12
Correlation coefficient Statistical inference n 13
n 14
Correlation coefficient Limitations n n n It quantifies only the strength of the linear relationship between two variables Care must be taken when the data contain any outliers, or pairs of observations that lie considerably outside the range of the other data points A high correlation between two variables does not imply a cause-and-effect relationship 15
http: //upload. wikimedia. org/wikipedia/commons/thumb/e/ec/Anscombe%27 s_quartet_3. svg/2000 px-Anscombe%27 s_quartet_3. svg. png 16
Correlation coefficient Spearman’s rank CC n 17
Any Questions? About correlation coefficient 18
Statistical inference n Basic tests – – – tests about proportions tests about one mean tests of the equality of two means tests for variances references • • n http: //zoro. ee. ncku. edu. tw/mlb 2009/res/14 -ch 5. pdf (pp. 27 -33) http: //www. math. isu. edu. tw/finance/course/sta/ch 8. ppt http: //www. tnb. org. tw/Image/ttest. ppt http: //www. mis. ncyu. edu. tw/course/download/cftai/Chapter%206. %20 Continuous%20 Probability%20 Distribution. PPT More advanced tests – ANOVA (analysis of variance) – goodness of fit (Wilcoxon test, Kolmogorov-Smirnov test, …) 19
Multivariate analysis n Statistics – ANOVA – Multiple linear regression • http: //www. sjsu. edu/faculty/gerstman/biostat-text/Gerstman_PP 15. ppt • http: //www. stat. nuk. edu. tw/Ray-Bing/regression/Chapter 3. ppt – PCA (principle component analysis) – ICA (independent component analysis) – LDA (linear discriminant analysis) n n So far, all techniques belong to statistics. You could find them in most statistical software, such as MATLAB, R (http: //www. r-project. org/), SPSS… Machine learning – Naïve Bayes (http: //zoro. ee. ncku. edu. tw/mlb 2009/res/11 -ch 4. pdf pp. 13 -27) – LIBSVM (http: //www. csie. ntu. edu. tw/~cjlin/libsvm/) – RVKDE (http: //mbi. ee. ncku. edu. tw/wiki/doku. php? id=rvkde) 20
Let’s see an Excel tutorial 21
Let’s see the data 22
Points to a good final project 23
Points to a good final project n Raise some interesting issues – from observations – you have at least two trap issues (next slide) n Design good analyses – make sure that your analyses fit your issues – do the results concur with your speculations? – design further analyses 24
Predict masked disease codes acode n class: name There are some ‘masked’ diseases codes – for example, disease #14 has no acode, class and name n n First, predict the masked disease names Second, some masked diseases whose names are not in the file (namely, novel diseases). Try to identify them, and, if possible, to figure out what disease they are 25
The final project includes n Presentation – slides (. ppt) and how you present them – convincing for me and your classmates – reasonably evaluate other works (voting others’ works if we have time) n Project – scripts (executable) – results (. txt, . xls, …) – a step-by-step README of how you get the results from cd. dat (. txt, . doc, …) n Report – a more detailed document of your slide (. doc) – the duty of each group member – anything worthy extra credit 26
Final grade n n n Email all the materials to darby@mail. ncku. edu. tw before 2011/6/20 23: 59 The raw grade will be available as soon as the final project of your group is received Ask me (darby@mail. ncku. edu. tw) about your grade with your NCKU email account The final (adjusted) grades must wait all groups (2011/6/21, I hope) You have about one week to double-check the grade, and the final grades will be submitted around 2011/6/27 27
- Slides: 27