Introduction to Statistical Analysis What is STATISTICS Statistics

  • Slides: 36
Download presentation
Introduction to Statistical Analysis

Introduction to Statistical Analysis

What is STATISTICS? �Statistics fulfill one of the basic human needs. �A process to:

What is STATISTICS? �Statistics fulfill one of the basic human needs. �A process to: �Manage - to clean and format the data in order to get a valid data which is feasible to be analyzed. �Analyze - to explore the data in order to answer the objective. �Interpret data - to convert the statistical interpretation to the common understanding.

Classification of Statistics � Descriptive Statistics - describe the data by summarizing them �

Classification of Statistics � Descriptive Statistics - describe the data by summarizing them � Inferential Statistics - techniques, by which. . - inferences are drawn for the population parameters from the sample statistics OR - Conclusions were made for the population using a sample data

What is the Descriptive Statistics for? In any study…. Before answering the research question,

What is the Descriptive Statistics for? In any study…. Before answering the research question, we should recognize the characteristics of the sample - (e. g. age, gender, ethnicity, etc)

How to describe a categorical variable? �Statistics - Frequency - Relative frequency - Cumulative

How to describe a categorical variable? �Statistics - Frequency - Relative frequency - Cumulative relative frequency �Figure/Chart - Bar - Pie

How to describe a numerical variable? �Statistics - Central tendency - Mean - Median

How to describe a numerical variable? �Statistics - Central tendency - Mean - Median (50 th Percentile) - Dispersion - Standard deviation - Inter-quartile range (IQR) (3 rd quartile – 1 st quartile) �Figure/Chart - Histogram/Frequency polygon - Box plot

Inferential statistics � With inferential statistics, we take a sample (a small subset of

Inferential statistics � With inferential statistics, we take a sample (a small subset of a larger set of data). � We then use this sample to draw inferences or make generalizations about the population from which the samples were drawn. Estimation (Confidence interval) • For e. g. : Estimation of mean, estimation of proportion Hypothesis test • For e. g. : Comparing means, comparing proportions, association between 2 variables

Estimation � In estimation, the sample is used to estimate a population parameter and

Estimation � In estimation, the sample is used to estimate a population parameter and a confidence interval about the estimate is constructed. � Estimation (CI) of mean: For e. g. : =16. 14, 95% CI = (15. 30, 16. 98) We are 95% sure that the mean duration of exercise of population will lie between 15. 30 and 16. 98 minutes/day. � Estimation (CI) of proportion: For e. g. : p =0. 37, 95% CI = (0. 27, 0. 47) We are 95% sure that the prevalence of the obesity in the population will be between 27% & 47%.

Hypothesis � Testable statement that describes s relationships of variables. � Derived from research

Hypothesis � Testable statement that describes s relationships of variables. � Derived from research questions. � Postulating the existence of: 1. A difference between groups. 2. An association among factors. � Null hypothesis (H 0): - Hypothesis to be tested, of no difference. � Alternative hypothesis (Ha): - Hypothesis that postulates that there is a treatment effect or a difference between groups. � The process of inferential statistics is to justify whether we have enough evidence (based on probability) to reject or fail to reject H 0.

Statistical Procedure Sample Size Data Collection Data Quality Statistical Analysis Interpretation

Statistical Procedure Sample Size Data Collection Data Quality Statistical Analysis Interpretation

Sample Size

Sample Size

What Is Sample Size? �Sample size is: the number of units (persons, animals, patients,

What Is Sample Size? �Sample size is: the number of units (persons, animals, patients, specific circumstances, etc. ) in a population need to be studied to represent the population.

Why We Need To Calculate Sample Size? �Guide : When to start and stop

Why We Need To Calculate Sample Size? �Guide : When to start and stop collecting? How are we going to collect it? �Minimum required sample: Depends on availability of the sample, time constraint, subject constraint and ethical issues �Study design : Influence the quality and accuracy of research �Economic : Waste of resources if not having the capability to produce useful

Process of Sample Size Determination

Process of Sample Size Determination

Journal sample size for pilot study

Journal sample size for pilot study

What Happened If Sample Size…. . Too small � Well conducted study may fail

What Happened If Sample Size…. . Too small � Well conducted study may fail to answer its research question. � May fail to detect important effects � May estimate those effects imprecisely Sample size should be adequate to achieve a good precision in estimation Too large � Costly – the longer the study the higher it cost � Difficulties face – lack of manpower and time � Tiring – recruitment of outcome or subjects maybe tiring for a long time

Statistical Software: Website address � Power and sample size http: //biostat. mc. vanderbilt. edu/wiki/Main/Power.

Statistical Software: Website address � Power and sample size http: //biostat. mc. vanderbilt. edu/wiki/Main/Power. Sample. Size � Epi – Info Software http: //wwwn. cdc. gov/epiinfo/html/downloads. htm � Epi. Calc http: //www. brixtonhealth. com/epicalc. html � Sample size for Prevalence Studies. xls – Lin Naing � Sample size for Sensitivity and Specificity. xls – Lin Naing � Power Analysis and Sample Size (PASS) – most powerful but have to buy the license first http: //www. ncss. com/pass. html

Data Collection

Data Collection

Data CATEGORICAL Nominal • E. g. : gender, race NUMERICAL Ordinal • Logical ordering

Data CATEGORICAL Nominal • E. g. : gender, race NUMERICAL Ordinal • Logical ordering to the categories, e. g. : education level, pain severity Continuous • E. g. : age, weight, height Statistics: • Frequency & percentage • Relative frequency & percentage • Cumulative frequency & percentage Statistic: Central tendency & dispersion • Mean & SD(if normality assumed) • Median & IQR (if skewed) Figure/chart: • Bar • Pie Figure/chart: • Histogram • Boxplot

DATA ENTRY Before key-in – What to prepare? q INSTRUMENT/QUESTIONNAIRE ◦ Purpose and objective

DATA ENTRY Before key-in – What to prepare? q INSTRUMENT/QUESTIONNAIRE ◦ Purpose and objective ◦ Variables and units ◦ Format q DATA DEFINITION/ DATA DICTIONARY ◦ Explain the summary of variables in terms of variable name, description, formatting and labelling (where necessary/applicable).

DATA DEFINITION/DATA DICTIONARY

DATA DEFINITION/DATA DICTIONARY

Data Quality

Data Quality

DATA CLEANING DUPLICATES More than one observation having same patientid MISSING VALUES Blank cell

DATA CLEANING DUPLICATES More than one observation having same patientid MISSING VALUES Blank cell without information. INCONSISTENCY 3 means nothing in the definition for ptgender EXTREME VALUES Value exceeds the upper limit. Definition: patientid - patient identification number. ptgender - 1 is Male and 2 is Female. height - defines from 1. 4 m till 1. 8 m. weight - defines from 50. 0 kg till 150. 0 kg.

Statistical Analysis

Statistical Analysis

Type of Analysis Parametric Test Non-parametric Test Dependent Variable Numerical Data Two (Independent) -Categorical

Type of Analysis Parametric Test Non-parametric Test Dependent Variable Numerical Data Two (Independent) -Categorical (e. g. smokers and nonsmokers) Independent t-test Mann-Whitney test > two (Independent) -Categorical (e. g. malay, chinese and indian) One-way ANOVA Kruskal-Wallis test Two (Dependent) -Categorical (e. g. pre and post intervention) Paired t-test Wilcoxon Signed Rank test -Numerical (e. g. weight in kg) Pearson’s correlation Spearman’s correlation Number of Groups

Type of Analysis Assumptions Number of Groups Assumptions Dependent Variable Categorical Data Two (Independent)

Type of Analysis Assumptions Number of Groups Assumptions Dependent Variable Categorical Data Two (Independent) -Categorical (e. g. smokers and nonsmokers) Chi-square test Fisher’s exact test > two (Independent) -Categorical (e. g. malay, chinese and indian) Chi-square test Fisher’s exact test Assumptions: Non-parametric Test ü The number of cells with Expected Count (EC) less than 5, must be less than 25% of the total number of cells. ü The smallest EC must be at least 2.

Example Study – numerical data �RQ: Is there any difference of time spent for

Example Study – numerical data �RQ: Is there any difference of time spent for exercise between obese and non-obese group? �Objective: To compare the mean duration of exercise between obese and non-obese group Assumption: ü Dependent variable should be approximately normally distributed for each category of the independent variable.

second Test statistic using SPSS STEP: Analyze >> Compare means >> Independent t test

second Test statistic using SPSS STEP: Analyze >> Compare means >> Independent t test Analyze >> Descriptive Statistics >> Explore Click

third Make a decision Descriptive statistics of each variable Levene’s test result If P

third Make a decision Descriptive statistics of each variable Levene’s test result If P value > 0. 05(not sig. ), read the first row(Equal variances assumed). If P value < 0. 05(sig), read the second row(Equal variances not assumed). The Levene’s test is not significant.

Interpretation Table 1: Comparing mean duration of exercise between obese(n=37) and nonobese(n=63) respondents “

Interpretation Table 1: Comparing mean duration of exercise between obese(n=37) and nonobese(n=63) respondents “ An independent-sample t-test indicated that duration of exercise were not significantly different between obese (Mean=16. 7, SD=4. 83) and nonobese (Mean=15. 8, SD=3. 88), t(98)=1. 06, p=. 291. Therefore, there is no significant association between duration of exercise and obesity. ”

Example Study – categorical data �Comparing 2 or more proportions. �RQ: Is there any

Example Study – categorical data �Comparing 2 or more proportions. �RQ: Is there any association between gender and obesity group?

second Test statistic using SPSS Analyze >> Descriptive Statistics >> Crosstabs Click 1 Click

second Test statistic using SPSS Analyze >> Descriptive Statistics >> Crosstabs Click 1 Click 2

third Make a decision must be at least 2 must be less than 25%

third Make a decision must be at least 2 must be less than 25%

fourth Interpretation Table 9: Association between gender and obesity “ A Chi-square test for

fourth Interpretation Table 9: Association between gender and obesity “ A Chi-square test for independence indicated that the prevalence (proportion) of obesity between male and female are not significantly different (P=0. 753). Therefore, there is no significant association between gender and obesity. ”

Statistical Procedure Sample Size Data Collection Data Quality Statistical Analysis Interpretation

Statistical Procedure Sample Size Data Collection Data Quality Statistical Analysis Interpretation

Thank You

Thank You