BEST PRACTICES FOR STATISTICS BEST PRACTICES Know what
BEST PRACTICES FOR STATISTICS
BEST PRACTICES Know what you know and what you don’t know Have a comparison group Use validated measures Have a Data Entry Plan Get to know your data If it doesn’t fit, change it Place your bets before you collect the data Use the best methods of analysis for your question & your data Go beyond the p-value
What is Statistics? • Study of Data • Collecting • Organizing • Summarizing • Analyzing • Presenting • Storing & Sharing Why is it Important? • Make sense of the data • Explain what happens and (possibly) why • Make sound decisions • To know how close we are to the truth.
PURPOSE OF STATISTICS Bias? Other Factors? Sampling Error? Results Random Error? Invalid Measures?
BEST PRACTICE: KNOW WHAT YOU ALREADY KNOW, WHAT YOU WANT TO KNOW AND WHAT YOU DON’T KNOW
STARTING WITH YOUR RESEARCH QUESTION How do users differ when (searching, finding, selecting) (articles, books, Web sites)? What are factors associated with ______? What are the effects of ______On ______? Which is better at improving _____? How are people (finding, selecting, using) _______?
KINDS OF VARIABLES Independent Dependent Subjects Objects Factors Outcomes Effects of… Effects on…
LEVELS OF MEASUREMENT (NOIR) Nominal • Counts by category • No meaning between the categories (Blue is not better than Red) Ordinal • Ranks • Scales • Space between ranks is subjective Interval • Integers • No baseline • Space between values is equal and objective, but discrete Ratio • Interval data with a baseline • Space between is continuous
ANOTHER WAY Qualitative • Counts by Categories • Ranks • Scales • Measurements Quantitative • Composite scores • Simple Counts
LIKERT-TYPE SCALE? Ordinal? Interval? Arbitrary Symmetrical Few Levels Many Levels Individual Questions Composite Score
BEST PRACTICE: HAVE A COMPARISON GROUP
WAYS OF COMPARING… Time Periods Other Libraries National Surveys Patron Types Material Types
KINDS OF COMPARISON Expected ranks or ratios • Qualitative • Comparison Two variables • Quantitative • Correlations Samples or Groups • Quantitative or Qualitative • Paired or Not Paired
BEST PRACTICE: USE A VALID MEASURE
VALIDITY OF MEASURES Are you actually measuring what you are trying to measure?
USE A TOOL WITH ESTABLISHED VALIDITY User Engagement Scale (UES) Approaches and Study Skills Inventory for Students (ASSIST)
ESTABLISH VALIDITY OF MEASURES Reliability • Consistency Content or • Common sense Face Validity Construct Validity • Based on theory Criterion Validity • Comparison with other valid measures
BEST PRACTICE: HAVE A DATA PLAN
GOAL OF DATA COLLECTION IN STATISTICS Reliability Bias
BIAS Systematic (not random) deviation from the true value (Statistics. com) Selection Bias Measurement • Observer Bias • Non-response Bias Analysis Bias
DATA INPUT Have a data entry plan Train the inputters Use data validation tricks Double-entry
BEST PRACTICE: GET TO KNOW YOUR DATA
EXPLORATORY DATA ANALYSIS Central Tendency Error Spread
MEASURES OF CENTRAL TENDENCY Mean • Average • For Quantative data • Excel function: =Average(range) Median • Middle • For Quantitative or Rank data • Excel function: =Median(range) Mode • Most common • Primarily for Qualitative data • Excel function: =Mode(range)
SPREAD & DISTRIBUTION
DISTRIBUTION OR SPREAD OF QUALITATIVE DATA Tables • Counts • Percentages/Ratios • Averages of Counts Excel • Pivot Tables
PIVOT TABLES IN EXCEL Select Data • Highlight table • Insert->Pivot Table Select Variables • Categories (Row Labels) • Values Change Settings • Percentage of Grand Total • Average
DEMONSTRATION OF PIVOT TABLES FOR SPREAD OF QUALITATIVE DATA
GRAPH & CHART RULES OF THUMB Trends Categorical Connection across the Xaxis Comparisons Few Categories Differences are Wide Grouped Stacked Relative Stacked
QUANTITATIVE DISTRIBUTIONS Stem & Leaf Histogram Distribution graphs
EXPLORATORY DATA ANALYSIS John W. Tukey Exploratory Data Analysis Examining your data visually. § Stem & Leaf § Hinges § Box plots § Scatter plots, etc.
STEM-AND-LEAF Years at UNT 0 1 1 1 2 2 2 2 3 4 4 5 5 5 6 6 6 7 7 7 8 8 11 11 12 12 13 13 13 15 16 17 17 18 18 19 29 29 30 32 34 35 First digit(s) Last digit Stem Leaf 0 0111222222233333344445556 666677788899 1 000011122223333356778899 2 00122234444799 3 0245
FROM STEM-AND-LEAF TO HISTOGRAMS
Stem Leaf Count 0 1122223334445555666666677777899 31 1 00001112222333346677889 27 2 0122234468 10 3 1112355888 11 4 12 2 Range Count 0 -9 31 10 -19 27 20 -29 10 30 -39 11 40 -49 2 Histogram of Years at UNT 40 30 20 10 0 0 -9 10 -19 20 -29 30 -39 40 -49
HISTOGRAMS IN EXCEL Set ranges • Options • Add-ins • Manage Add-ins Analysis Toolpak • Equal Size Ranges • Ceiling (“more”) For Histogram 9 19 29 39 49 Create Graph • Data Analysis • Histogram Create Histogram • Insert Bar Chart • Highlight histogram • Select bars & Format Selection • Gap Width=0%
DEMONSTRATION OF HISTOGRAM IN EXCEL
SPREAD OF QUANTITATIVE DATA How variable is the data? Range Quantiles Standard Deviation
RANGE & QUARTILES
PRESENTATION OF SPREAD Box plots § Median § Upper & lower quartiles § Outliers
STANDARD DEVIATION Measure of dispersion of data Square root of the average variation from the mean
WHAT DOES THE SD TELL YOU? Greater variation, less certainty Lower variation, more certainty
SPREAD IN EXCEL Range • Min(range) • Max(range) Quantiles • Percentiles. inc(range, %) • Quartile. inc(range, {1, 2, 3, 4}) Standard Deviation • STDEV. S(range)
NORMAL DISTRIBUTION
SKEWED DISTRIBUTIONS
DEMONSTRATION OF DISTRIBUTIONS Distribution of the Population The “Truth” N is the # of samples n is the number of items in each sample Watch the cumulative mean & medians slowly merge to the population
BEST PRACTICE: IF IT DOESN’T FIT, CHANGE IT Transform ation of data
WHY TRANSFORM? Years at UNT 50 45 40 35 30 25 20 15 10 5 0 16 14 12 10 8 6 4 2 1 1. 2 1. 3 1. 4 1. 5 1. 6 M or e 5 0. 6 0. 7 0. 8 0. 9 0. 4 3 0. 2 0 0. 30 -39 0. 20 -29 1 10 -19 0. 0 -9 Log 10(Years at UNT)
HOW TRANSFORMATION WORKS Y=a+bx Log(Y)=Log(a+bx) 1/Y = 1/(a+bx)
HOW TO BECOME NORMAL Evaluate the distribution of raw data Select a transformation method Transform the data Normally Distributed? Statistically Test Transformed Data Express the result in the terms of the transformation
BEST PRACTICE: PLACE YOUR BETS BEFORE YOU START
INFERENTIAL STATISTICS Tests of hypotheses • Associations • Expectations Accounts for uncertainty • Random error • Confidence interval
HYPOTHESIS TESTING Your Null Hypothesis (H 1) (H 0)
EXAMPLE HYPOTHESIS UNT Libraries provides access to… >=75%* <75%* *…of journal articles cited by UNT PACS faculty in journal articles published between 2008 -2011.
HYPOTHESIS TESTING Sample Size Significance Level Distribution Central Tendency p Spread
TESTING HYPOTHESES
BEST PRACTICE: CHOOSE THE BEST METHOD FOR YOUR QUESTION AND DATA
KNOW THE TESTS Assumptions Limitations Appropriate data type What the tests
FACTORS ASSOCIATED WITH CHOICE OF STATISTICAL METHOD Variable Type What is being compared Independence of units Underlying variance in the population Distribution Sample size Number of comparison groups
USE A FLOW CHART
BEST PRACTICE: GOING BEYOND THE PVALUE
AND THE P-VALUE SAYS… Much about the distributions More about the H 0 than H 1 Little about size of differences
MORE USEFUL STATISTICS Effect Sizes • Tell the real story Confidence Intervals • State your certainty
EFFECT SIZES OF QUANTITATIVE DATA Correlations • Cohen’s guidelines for Pearson’s r Effect Size r> Small . 10 Medium . 30 Large . 50 Differences from the mean • • •
EFFECT SIZES OF QUALITATIVE DATA Based on Contingency table Odds ratio Relative risk Test A/B Yes No Total Yes 10 15 25 No 50 25 75 60 40 100 Totals • Odds of event A divided by odds of event B • Case-control studies • Uses probabilities rather than odds • Experiments, RCTs
CONFIDENCE INTERVALS Point estimates • Single value • Mean Intervals • Degree of uncertainty • Range of certainty around the point estimate Based on • Point estimate (e. g. mean) • Confidence level (usually. 95) • Standard deviation Expressed as: • The mean score of the students who had the IL training was 83. 5 with a 95% CI of 78. 3 and 89. 4.
STATISTICAL ANALYSIS Noise Signal
BEST PRACTICES Know what you know and what you don’t know Have a comparison group Use validated measures Have a Data Entry Plan Get to know your data If it doesn’t fit, change it Place your bets before you collect the data Use the best methods of analysis for your question & your data Go beyond the p-value
RESOURCES Rice Virtual Lab in Statistics Excel Tutorials for Statistical Analysis Khan Academy videos Basic Research Methods for Librarians Descriptive Statistical Techniques for Librarians
- Slides: 68