On False Research Findings Jim Ohlson June 5

Motivating Literature Ioannides (2007) : “More than half of all published research findings are

TWO DICEY QUESTIONS First, do we (“accounting scholars”) really care whether or not a

A Few Basic Diagnostics (i) A critical RHS variable missing in the table of

“THE CLEVER ONE” (iv) There are two steps to implement this diagnostic. First, rank

Yes, Most People Are Aware…. . (vi) This diagnostic addresses the large N issue:

TWO UNERLYING ISSUES SET THE STAGE FOR UNWARRANED/FALSE CONCUSIONS • Researchers feel pressured into

UNERLYING ISSUES SET THE STAGE FOR UNWARRANED/FALSE CONCUSIONS • We have to come to

Slides: 9

Download presentation

On False Research Findings Jim Ohlson June 5, 2020 1

Motivating Literature Ioannides (2007) : “More than half of all published research findings are false” Star-gazing is never sufficient (perhaps sometimes necessary) If no strong prior, the variable almost surely does not have a material effect Nate Silver: “The Signal and the Noise” (Bestseller NYT list) “Classical statistics” a la Fisher makes little sense – at least in the real world. Campbell Harvey, Finance Professor, Presidential address: “P-hacking is pervasive and it seriously undermines the integrity empirical finance” Leamer (1976): “To pick a t-stat cutoff (significance-level) without reference to N makes no logical sense. ” Relatedly, the Jeffreys-Lindley Paradox: “The bigger the data set, the less effective significance tests are at spotting fluke findings. ” 2

TWO DICEY QUESTIONS First, do we (“accounting scholars”) really care whether or not a paper states a takeaway which is warranted? My best guess: Neither producers, reviewers, or consumers, lose any sleep over this matter. Practical implication: Papers in A-journals are presumed correct. Second, given my answer to the first question, should it then not also be the case that ethics becomes an issue? My best guess: It follows. Issue: Can one spot the likely rotten apples. What are the “diagnostic tools”? 3

A Few Basic Diagnostics (i) A critical RHS variable missing in the table of descriptive statistics. Interactive effects typically missing. (ii) The increase in the R-square is trivial when the MVIs have been added to the regression’s RHS (“trivial” = first three digits are identical). (Note: the info has totally disappeared during the past 10 years plus. ) (iii) Controlling variables: the paper refers to the prior literature but without a specific reference. There may be “strange” variables. And obviously missing RHS variables. Also, coefficients with the wrong signs yet statistically significant. (Sign shifting but the significance and overall R-square does not). NOWADAYS: Papers rarely raise the correct sign issue. “Consistent with the prior literature…. . 4

“THE CLEVER ONE” (iv) There are two steps to implement this diagnostic. First, rank the bivariate correlations; that is, rank each and every independent variable in terms of their simple correlations with the dependent variable. The table of descriptive statistics’ should provide the relevant data. It may for example be the case that the correlation the MVI and the dependent variable ranks say number 10 out of a total of say 15 independent variables (excluding the FEs). In the second step, rank the variables in terms of t-statistics (absolute values, to be sure). Issue: Which will rank the best, the simple correlation or the t-statistic? Note: My null hypothesis, in a world without screen-picking, a fifty-fifty proposition Note: the past few years a majority of papers no longer report on t-statistics. (Instead, on P-inequalities or Std-error) 5

Yes, Most People Are Aware…. . (vi) This diagnostic addresses the large N issue: Is it not the case that the t-statistics can look on the low side – “I am not the least impressed” -- given how large the N is? Obvious case: N=40, 000 though the t-statistic is say 2. 8. Can we firm up this idea? Rule: The effect is too modest (in my opinion) if t-statistic/sqrt(N) < 0. 03 Ex. If N=40, 000, then I think the t-stat ought to be at least 6. (Note: A distribution N(0, 1) effectively the same as N(0. 03, 1) ). If the inequality holds then the variable can be deleted without changing the R-square; first three digits will remain the same. In Johannesson, Zhai, and JO; SSRN (very soon) 6

TWO UNERLYING ISSUES SET THE STAGE FOR UNWARRANED/FALSE CONCUSIONS • Researchers feel pressured into posing farfetched hypotheses – if not farfetched, then, either, “not new” or “too obvious”. (Note: Farfetched hypothesis often contrary to elementary equilibrium (rationality) concepts. ) • Generally Accepted Accounting Research Principles focus almost solely on star-gazing. This aspect stands in sharp contrast to the so -called real sciences. (Note: In the sciences the issue of “RQ” comes up late rather than early in the paper) 7

UNERLYING ISSUES SET THE STAGE FOR UNWARRANED/FALSE CONCUSIONS • We have to come to grips with the fact that there are many nicely crafted papers ( by famous scholars) that are unlikely to be valid in a substantive sense. (Ioannides observation: many papers that have been repudiated using the gold-standard paradigm (double blind tests) remain part of the medical literature. • The use of hold-out samples would most likely reject more than half of papers published. Who cares? How long shall “we” pretend that this is fine? • And what about totally dismissing annual regressions? ? 8

Thank you 9