Differentiating between statistical significance and substantive importance Jane
Differentiating between statistical significance and substantive importance Jane E. Miller, Ph. D The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.
Overview • Substantive significance defined • Quick review of statistics – What questions can they answer? – What questions can’t they answer? • How to implement a balanced presentation of multivariate results. Both – Statistical significance – Substantive importance The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.
Objective of most research papers • Few people who write about multivariate analysis are focused solely on statistical mechanics such as developing new computer algorithms or formal statistical tests. – Some statisticians and methodologists will have those interests. • Most of us are interested in studying some relationship among social science or health concepts. – Test a hypothesis, derived from theory or previous empirical studies. – Inferential statistics are a necessary tool for hypothesis testing in quantitative research. The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.
What is substantive significance? • Substantive significance of an association between two variables. – “So what? ” – “How much does it matter? ” • Real-world relevance to topic • In various disciplines, substantive significance = – “clinically… – “economically… – “educationally… – …meaningful” variation. The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.
Example: BMI & mortality • Body mass index (BMI) shows a statistically significant positive association with mortality. • But is that gradient substantively significant? – Is it worth designing an intervention to decrease BMI as a way of decreasing mortality? The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.
Key criteria for assessing substantive significance • Is the association causal? – Will changing the hypothesized cause lead to change in the purported effect? – Will weight loss (reduced BMI) yield lower mortality? • Is the effect big enough to matter? – Is the excess mortality among overweight or obese persons large enough to justify a program? • Can the hypothesized cause be changed? – Is BMI malleable? The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.
Example prose • “For every hour a boy played a video game, he read just two minutes less than a boy who didn’t play video games. Notably, non-gaming boys didn’t read much at all either, spending only eight minutes a day with a book. ” • From a NYT summary of Cummings and Vandewater, 2007. “Relation of Adolescent Video Game Play to Time Spent in Other Activities, ” Archives of Pediatrics and Adolescent Medicine. The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.
Quick review of statistical significance testing The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.
Start with a hypothesis • In the gaming example, the authors hypothesized that the more time adolescents spent on video games, the less time they spent on homework. – So far, description is purely in terms of the concepts under study. – No statistical jargon, yet… • To formalize this for statistical testing – Homework time = dependent variable (Y) – Gaming time = independent variable (Xi) – Ha= gaming time is negatively associated with homework time. • In other words, Xi is inversely associated with Y The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.
Contrast it against the null hypothesis • The assumption of “no difference between groups” is called the null hypothesis (H 0). • In the study on effects of gaming on homework time – H 0: time among gamers = time among non-gamers OR – time among gamers - time among non-gamers = 0 – In words, the null hypothesis states that there is no difference in the amount of time spent on homework by gamers versus non-gamers. The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.
What ? does inferential statistics answer? • “How likely would it be to obtain a difference at least as large as that observed between groups in the sample if in fact there is no difference between groups in the population? ” • The p-value tells us the probability of falsely rejecting the null hypothesis. – Conventional levels of “statistical significance” : p<. 05 – Strictly speaking, p<. 05 tells us that for a large sample such as that used in the gaming study (N~1, 400), the estimated coefficient on time spent gaming is at least 1. 96 times its standard error. The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.
What questions DOESN’T it answer? • Whether the relationship is – Causal • Association ≠ causation – In the expected direction • The difference could be statistically significant but in the opposite of the hypothesized direction. – Big enough to matter in the real-world context • Each hour spent gaming reduced reading time by 2 minutes. Is that enough to induce genuine concern from parents or teachers? – Malleable The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.
Conclusion: Don’t stop at “p<. 05”! • “p<. 05” answers only part of what we want to know about our research question. – It is a necessary but not sufficient part of statistical analysis. • Also need to consider questions about – Substantive significance • Direction • Size – Causality • Non-causal associations should not be used to inform policy or program changes. • Confounding or spurious associations should be ruled out. – Often why we estimate a multivariate model. The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.
Substantive significance overlooked • Many statistics textbooks show to assess and present statistical significance. • Few if any show to assess and present substantive significance. The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.
Balance presentation of statistical and substantive significance • How to include both: – Inferential statistics formal hypothesis testing. – Interpretation of substantive significance of findings in the context of the specific research question. • Critical for policy-makers and others not formally trained in statistics. The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.
Principles for presenting results • Name the specific variables. Avoid – Writing about “my dependent variable” or “the coefficient. ” – Using acronyms from your database • Report numbers in tables. – Complete set of coefficients, standard errors, goodness-of-fit statistics. • Interpret numbers in text. – Incorporate units and categories for variables into the prose description. The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.
What to report for coefficients • Direction (AKA “sign”) – For categorical independent variables (IV), which category has higher value of the dependent variable (DV)? – For continuous IVs, is the trend in the DV up, down, or level? • Magnitude – How big is the difference in the DV across values of the IV? • Statistical significance The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.
Gender as a predictor of birth weight • Poor: “Boys weigh significantly more at birth than girls. ” – Concepts and direction but not magnitude. – Statistical significance is ambiguous: Is the term “significant” intended in the statistical sense or to describe a large difference? • Slightly better: “Gender is associated with a difference of 116. 1 grams in birth weight (p<. 01). ” – Concepts, magnitude, and statistical significance but not direction: Was birth weight higher for boys or for girls? • Best: “At birth, boys weigh on average 116 grams more than girls (p<. 01). ” – Concepts, reference category, direction, magnitude, and statistical significance. The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.
Substantive issues for coefficients on continuous predictors • A β on a continuous independent variable measures the change in the dependent variable for a 1 -unit increase in that independent variable. – For some variables, a 1 -unit increase is too small to be substantively meaningful. • E. g. , a $1 increase in annual per capita income in the US today. – For other variables, a 1 -unit increase is too big to be plausible. • E. g. , a 1 -unit increase in a variable measured as a proportion. • “The Goldilocks problem” • Need to look at distribution of values in your data. The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.
Solutions to Goldilocks problems • In those cases, to assess whether a coefficient is “big” or “small, ” need a different sized contrast. • Important for comparing coefficients across variables. • See related podcasts on the Goldilocks problem. • Identifying a Goldilocks problem • Solutions: – Defining variables – Specifying models – Interpreting results The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.
Substantive significance in the discussion • Place findings back in the broader perspective of the original research question. • Do they correspond to your hypothesis in terms of – Direction (sign) of the effect? – Size? – Was the effect size attenuated when potential confounders or mediators were introduced into the model? • What is the evidence for a causal relationship? – If not causal, what explains the association? – If causal, what are the implications for policy, programs, etc. ? The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.
Substantive issues from gaming study • “But the meaning of the finding [that girls who are gamers spend less time than non-gamers on homework] is not clear, as high-academic achievers often spend less time on homework as well. ” – Places the finding in broader context by discussing other correlates of homework time. • “Although only a small % of girls played video games, our findings suggest that gaming may have different social implications for boys than for girls. ” – Raises the question of selection effects: which girls play video games, and do their other characteristics affect how they spend their time? The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.
Relate findings to previous studies’ • Are your findings consistent with the published literature on the subject in terms of statistical significance, sign, and approximate size? • If not, why not? – Different sample (place, time, subgroup) – Different data source or study design – Different model specification • Included potential confounders not previously analyzed. • Tested for possible mediating effects of 1+ factors. The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.
Statistical significance in the discussion • Describe in words, not #s. – No detailed standard errors, p-values, or test statistics. • Focus on the purpose of the statistical tests – Did the main variable of interest increase proportion of variance explained by the model? – Did some other variable “explain” the association between your key variable and the outcome? The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.
Summary • Emphasize the substantive issues behind the statistical analyses. – Design the specification to match topic and data. – Choose plausible, relevant numeric contrasts. • Aim for a balanced presentation of statistical significance and substantive importance. – Use prose to ask and answer research question. – Use tables to report comprehensive, detailed statistics. – Use charts if needed to convey complex patterns. The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.
Suggested resources • Chapter 3 (Statistical significance, substantive significance, and causality) in – Miller, J. E. , 2004. The Chicago Guide to Writing about Numbers OR – Miller, J. E. , 2013. The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. (“WAMA II”) • Miller J. E. and Y. V. Rodgers, 2008. “Economic Importance and Statistical Significance: Guidelines for Communicating Empirical Research. ” Feminist Economics. 14(2): 117 -149. • Chapter 10 (Goldilocks problem) in WAMA II. The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.
Suggested online resources • Podcasts on – Comparing two numbers or series – Reporting coefficients from OLS and logit models – Defining the Goldilocks problem – Resolving the Goldilocks problem: Presenting results The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.
Suggested practice exercises • Study guide to The Chicago Guide to Writing about Multivariate Analysis, 2 nd Edition. – Questions #2 and #4 from the problem set for chapter 3 – Suggested course extensions for chapter 3 • “Reviewing” exercises #1– 4 • “Writing and revising” exercises #1 and #2 The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.
Contact information Jane E. Miller, Ph. D jmiller@ifh. rutgers. edu Online materials available at http: //press. uchicago. edu/books/miller/multivariate/index. html The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.
- Slides: 29