Misleading or Falsification Inferring Deceptive Strategies and Types

Deceptive News Shared Online WWW Track on Journalism, Misinformation and Fact-Checking November 25, 2020

Contributions Our approach: Recent work: Focusing on deception types and Psycholinguistic analysis across deception

Deception Types and Strategies Disinformation: false facts to deliberately deceive the audience VS. Misleading:

Task Definition Build generalizable predictive models to differentiate between deception types and strategies in

Datasets: Deception Strategies Domains Misleading Falsification Summaries 616 1, 376 News Pages 81 85

Datasets: Deception Types Domains Propaganda Hoaxes Disinformation News Pages 17, 872 5, 297 166

Predictive Models and Signals Predictive models: Machine learning models: Max. Entropy and Random. Forest

Deception Strategy Classification Results: Misleading vs. Falsification Summaries News Pages Tweets 1. 0 F

Deception Type Classification Results: Propaganda vs. Disinformation vs. Hoax News Pages Tweets 1. 0

Connotation Analysis: Background Identify writers’ intent behind digital misinformation by analyzing psycholinguistic signals –

Connotation Analysis Results: Disinformation Writer → agent Writer → theme Implications: Quantitatively demonstrate how

Linguistic Realizations of Deception Misleading vs. Falsification Significant differences in subjective language and moral

Summary and Future Work Predictive signals: Content + moral foundations and connotations are more

Future Work Multilingual, multimodal (text and images) deception classification Misinformation propagation and influence (deception

Svitlana Volkova Senior Research Scientist Data Sciences and Analytics Group Computational and Statistical Analytics

Slides: 16

Download presentation

Misleading or Falsification? Inferring Deceptive Strategies and Types in Online News and Social Media AUTHORS: S. VOLKOVA, J. JANG, PRESENTER: MARIA GLENSKI Data Sciences and Analytics, National Security Directorate, Pacific Northwest National Laboratory WWW Track on Journalism, Misinformation and Fact-Checking, April 25 th, Lyon, France WWW Track on Journalism, Misinformation and Fact-Checking November 25, 2020 1

Deceptive News Shared Online WWW Track on Journalism, Misinformation and Fact-Checking November 25, 2020 2

Contributions Our approach: Recent work: Focusing on deception types and Psycholinguistic analysis across deception strategies – deception types in the news misleading and falsification pages (Rashkin et al. , 2017) Verifying model generalizability Predicting credibility of Politi. Fact across domains: news pages, statements (Rashkin et al. , 2017; tweets and summary statements Wang et al. , 2017) and analyzing credibility of tweets (Mitra et al. , Qualitatively analyze writers’ 2017) intent behind misinformation: Models to classify deceptive psycholinguistic signals news types on Twitter (Volkova moral foundations et al. , 2017) connotations WWW Track on Journalism, Misinformation and Fact-Checking November 25, 2020 3

Deception Types and Strategies Disinformation: false facts to deliberately deceive the audience VS. Misleading: topic changes, irrelevant information, and equivocations Falsification: with contradictions or distortions Misinformation is conveyed in the honest but mistaken belief that the relayed incorrect facts are true Propaganda: a form of persuasion to influence audiences via controlled transmission of deceptive, selectively omitting, and one-sided messages Hoax: type of misinformation that aims to deliberately deceive the reader WWW Track on Journalism, Misinformation and Fact-Checking November 25, 2020 4

Task Definition Build generalizable predictive models to differentiate between deception types and strategies in news across domains Deception strategies: misleading vs. falsification Misleading Falsification More Intent to Deceive Less Intent to Deceive Deception types: propaganda vs. hoax vs. disinformation Propaganda Disinformation More Intent to Deceive Hoax Less Intent to Deceive WWW Track on Journalism, Misinformation and Fact-Checking November 25, 2020 5

Datasets: Deception Strategies Domains Misleading Falsification Summaries 616 1, 376 News Pages 81 85 Tweets 96 109 Confirmed cases of disinformation summaries from the European Union’s East Strategic Communications Task Force: https: //euvsdisinfo. eu/ and @EUvs. Disinfo Falsification: unprovable, no evidence, no proof, no supporting evidence, Crowdsourcing: pairwise inter-annotator agreement kappa is 0. 64 (5 annotators) Followed URLs in disinformation summaries to collect the original news pages Queried Twitter public API using SVO and timestamps to extract unique disinfo tweets Parsed summaries, news pages and tweets using Syntax. Net to extract SVO tuples Understand agents and themes of deception Contrast connotations/perspectives across deception types WWW Track on Journalism, Misinformation and Fact-Checking November 25, 2020 6

Datasets: Deception Types Domains Propaganda Hoaxes Disinformation News Pages 17, 872 5, 297 166 Tweets 3, 834 453 205 Collecting propaganda and hoax news pages and tweets: Downloaded 17, 872 propaganda (Activist. Post), 5, 297 hoax (DCGazette) news pages Collected the corresponding propaganda and hoax tweets using public Twitter API Collecting disinformation news pages and tweets: Followed URLs in disinformation summaries to collect the original news pages Queried Twitter public API using SVO and timestamps to extract disinformation tweets WWW Track on Journalism, Misinformation and Fact-Checking November 25, 2020 7

Predictive Models and Signals Predictive models: Machine learning models: Max. Entropy and Random. Forest 1 Neural network models: Long Short-Term Memory (LSTM), Convolutional Neural Network (CNN)2 What is being discussed Lexicons Predictive signals: Content: TFIDF, dimensionality reduction, Glo. Ve embeddings 3 the Style, syntax, complexity and readability: Automated Readability Index How (ARI), content is being Flesch-Kincaid readability tests, Coleman-Liau index 4 discussed Biased language: intensifiers, dramatic adverbs, assertive, imperative, report verbs Moral foundations: care and harm, fairness and cheating, loyalty and betrayal, authority and subversion, purity and degradation Psycholinguistic signals: imperative commands, personal pronouns, emotional language, quotations, and inclusions How emotional, 1 https: //nlp. stanford. edu/projects/glove/ 2 http: //scikit-learn. org/stable/ subjective the discussion is 3 https: //keras. io/ 4 https: //github. com/nltk_contrib/tree/master/nltk_contrib/readability WWW Track on Journalism, Misinformation and Fact-Checking November 25, 2020 8

Deception Strategy Classification Results: Misleading vs. Falsification Summaries News Pages Tweets 1. 0 F 1 score 0. 9 . 82 0. 8. 82 0. 7 . 57 0. 6 0. 5 Content Syntax Style Connotations Lexicons The best models are LSTM and Max. Entropy Falsification strategy is easier to identify than misleading strategy Deceptive strategies are easier to predict in tweets than in summaries and news Predictive signals: connotations (summaries), moral foundations, biased language and psycholinguistic cues (news pages), syntax and connotations (tweets) WWW Track on Journalism, Misinformation and Fact-Checking November 25, 2020 9

Deception Type Classification Results: Propaganda vs. Disinformation vs. Hoax News Pages Tweets 1. 0 F 1 score 0. 9 . 82 . 87 0. 8 0. 7 0. 6 0. 5 Content Syntax Style Connotations Lexicons The best performing model is LSTM Disinformation is easier to predict than propaganda or hoaxes Deceptive news types – disinformation, propaganda, and hoaxes, unlike deceptive strategies, are more salient, and easier to identify in tweets than in news pages Predictive signals: content (summaries and tweets) WWW Track on Journalism, Misinformation and Fact-Checking November 25, 2020 10

Connotation Analysis: Background Identify writers’ intent behind digital misinformation by analyzing psycholinguistic signals – moral foundations and connotations extracted from different types of deceptive news WWW Track on Journalism, Misinformation and Fact-Checking November 25, 2020 11

Connotation Analysis Results: Disinformation Writer → agent Writer → theme Implications: Quantitatively demonstrate how agents and themes of strategic deception vary across deception types Qualitatively identify the hidden agenda of content WWW Track on Journalism, Misinformation and Fact-Checking November 25, 2020 12

Linguistic Realizations of Deception Misleading vs. Falsification Significant differences in subjective language and moral foundations Misleading statements are more subjective than falsified statements in summaries and news pages but not tweets Falsified compared to misleading statements include more: Harm+ and Ingroup+ signals in tweets Affect terms in tweets Tweets Implications: Build models for factuality assessment without external knowledge Improve fact-checking systems by going beyond fake news classification WWW Track on Journalism, Misinformation and Fact-Checking November 25, 2020 13

Summary and Future Work Predictive signals: Content + moral foundations and connotations are more predictive of How emotional deception strategies than style and syntax and subjective the Content is the most predictive of deception types discussion is What is being discussed Predictive models: LSTMs achieve higher performance compared to ML models Deception types: Disinformation is less difficult to predict compared to hoaxes and propaganda Deception strategies: Falsification strategy is easier to infer than misleading strategy Content is the most predictive of misleading strategy across all domains WWW Track on Journalism, Misinformation and Fact-Checking November 25, 2020 15

Future Work Multilingual, multimodal (text and images) deception classification Misinformation propagation and influence (deception types, languages) Reactions to deceptive news across platforms: Reddit and Twitter References: Separating Facts from Fiction: Linguistic Models to Classify Suspicious and Trusted News Posts on Twitter. S. Volkova, K. Shaffer, J. Jang and N. Hodas. ACL 2017. Truth of Varying Shades: On Political Fact-Checking and Fake News. H. Rashkin, E. Choi, J. Jang, Y. Choi, and S. Volkova. EMNLP 2017. Fishing for Clickbaits in Social Images and Texts with Linguistically-Infused Neural Network Models. M. Glenski, E. Ayton, D. Arendt and S. Volkova. Proceedings of Google Clickbait Workshop. 2017. WWW Track on Journalism, Misinformation and Fact-Checking November 25, 2020 16

Svitlana Volkova Senior Research Scientist Data Sciences and Analytics Group Computational and Statistical Analytics Division svitlana. volkova@pnnl. gov http: //www. cs. jhu. edu/~svitlana/