Towards Reliable Hypothesis Validation in Social Sensing Applications
Towards Reliable Hypothesis Validation in Social Sensing Applications Dong Wang, Daniel Zhang, Chao Huang Department of Computer Science and Engineering University of Notre Dame SECON 18, Hong Kong, China 1
Sensing is Evolving 2
Platform Sensing is Evolving Sensors are increasingly used by everyday people Smart Phone 3
Sensing is Evolving Platform Sensors are increasingly used by everyday people Smart Phone Application Social (Human-Centric) Sensing is Emerging! Health Monitoring Human are getting into the Loop of Sensing Geotagging Target Tracking Environment Monitoring Smart House Social Sensing 4
Social Sensing A set of applications where data are collected from human sources or devices on their behalf. Human + Cyber + Physical Twitter Mood Predicts Stock Market, 2011 Help Pilgrims utilize schedule in Hajj , 2012 An Emerging Paradigm of Cyber-Physical Systems with Human-in-the-loop Japan Tsunami and Nuclear Event, 2011 Four. Square helps blind people navigate , 2012 5
Why Social Sensing? A Confluence of Three Trends Mass Dissemination Media Connectivity Smart Phone Cars on Internet Smart Meter Cell-phones Sensors GPS 6
Truth Discovery in Social Sensing What to believe? Who to believe? Text People Smart Devices Sources Reliable Information for Decision Support! Numeric data Images Measurements (Claims) 7
Our Problem: Reliable Hypothesis Validation 8
Related Work Dynamic and Scalable Model 5 Truth Discovery IPSN 12 Basic Model 1 ICDCS 17 ICDCS 13 Recursive Model 2 IPSN 14, 16 Source Dependency 3, 4 SECON 18 Reliable Hypothesis Validation 6 1. Dong Wang, Lance Kaplan, Hieu Le, and Tarek Abdelzaher. "On Truth Discovery in Social Sensing: A Maximum Likelihood Estimation Approach. " IPSN 12, Beijing, China April 2012. 2. Dong Wang, Tarek Abdelzaher, Lance Kaplan and Charu C. Aggarwal. "Recursive Fact-finding: A Streaming Approach to Truth Estimation in Crowdsourcing Applications. ", ICDCS 13, Philadelphia, PA, July 2013. 3. Dong Wang, Tarek Abdelzaher and Lance Kaplan. "Humans as Sensors: An Estimation Theoretic Perspective. ” IPSN 14, Berlin, Germany, April, 2014. 4. Chao Huang, Dong Wang. "Topic-Aware Social Sensing with Arbitrary Source Dependency Graphs, " IPSN 16, Vienna, Austria, April, 2016 5. Daniel Zhang, Chao Zhang, Dong Wang, Doug Thain, Xin Mu, Greg Madey and Chao Huang. "Towards Scalable and Dynamic Social Sensing Using A Distributed Computing Framework, " ICDCS 17, Atlanta, GA, USA 6. Dong Wang, Daniel Zhang, Chao Huang*. "Towards Reliable Hypothesis Validation in Social Sensing Applications", SECON'18 , Hong Kong, June, 2018. 9
Technical Challenges • Challenge 1: Hypothesis-Claim Matching – How to match the high-level hypotheses generated by end users to the relevant low-level claims generated by social sensors? • Challenge 2: Hypothesis Validation – How to reliably validate the truthfulness of the hypotheses from the estimated truthfulness of the claims? 10
Basic Definitions • • • Sources: Claims: Hypotheses: Claim Truthfulness Vector: Hypothesis Truthfulness Vector: 11
Basic Definitions Source Claim Matrix: SC (M by N) N Source Si reports claim Cj 1 0 Source Si does not report claim Cj M • M: Number of sources; N: Number of claims. 12
Basic Definitions Claim Hypothesis Matrix: CH (N by K) K Degree of correlation bertween claim Cj and hypothesis Hk 0. 7 N • N: Number of Claims; K: Number of Hypothesis 13
Our Goal Output: Hypothesis Truthfulness Estimated Claim Truthfulness 14
Solution: Reliable Hypothesis Validation (RHV) 1. Topic Identification from Claims 2. Hypothesis Claim Matching 3. Optimal Hypothesis Validation 15
RHV: Topic Identification from Claims • Objective: – Identify important topics that provide clues to help end users generate relevant hypotheses • Approach: – Topic Modeling and Gibbs Sampling Algorithm Claim distribution over topics Topic distribution over words • Output: – T topics associated with a list of words that are strongly correlated with each topic 16
RHV: Hypothesis Claim Matching • Objective: – Match the hypothesis from end users to the most relevant claims that can be used to validate its correctness • Approach: – Compute the similarity between hypothesis • Sematic Similarity (words) • Syntactic Similarity (order of words) • Overall Claim Hypothesis Similarity 17
RHV: Hypothesis Claim Matching • Approach: Maximize the relevance between claims and hypothesis – Critical Claim Selection: Minimize the dependency between claims – Solve the multi-objective optimization problem using linear combination: Multi-objective optimization with constraints 18
RHV: Optimal Hypothesis Validation • Objective: – Validate the truthfulness of hypotheses from the estimated truthfulness of the identified critical and relevant claims • Approach: – Claim Truthfulness Estimation • Truth Discovery Solutions – Optimal Hypothesis Validation • Reliable Hypothesis Validation Scheme 19
An Example of Truth Discovery Solutions: Expectation Maximization Estimation parameter Observed data Expectation Step (E-step) Z={z 1, z 2, …z. N}: Correctness Hidden Variable Sensing Observations X Apply EM Maximization Step (M-step) Find MLE of estimation parameter and values of hidden variables 20
RHV: Optimal Hypothesis Validation • Optimization Formulation CH Matrix • Approach: Weighted Mean Algorithm 21
Evaluation: A Real World Application Unreliable and Noisy Tweets Unreliable and Hypothesisignorant Users Paris Charlie Hebdo Attack, Nov. 2015 Oregon Shooting, Oct. 2015 Baltimore Riots, April, 2015 22
Evaluation: Data Collection http: //apollo. cse. nd. edu/index. html Keywords/Location RHV is integrated as an option for data analysis 23
Evaluation: Real-World Application Data Trace Statistics: Hypothesis Set Generation: • 5 independent individuals serve as end users • Each individual generated 30 hypotheses for each dataset • Clean up the hypothesis set by removing redundant and non conclusive ones • Manually collect ground truth labels for evaluation purpose 24
Evaluation: Performance Comparison (1/3) Paris Attack Data Trace (2015) Similar results are observed in other two datasets 25
Evaluation: Performance Comparison (2/3) 26
Evaluation: Performance Comparison (3/3) Our Approach Execution Time Comparison RHV is among the fastest in the compared schemes across different datasets 27
Future Work • Explore more comprehensive claim and hypothesis matching approaches • Consider a hierarchical structure from claims to hypothesis • Explore logical relationship between hypotheses • Validate the developed models to applications beyond Twitter 28
Conclusion • This paper formulates a new hypothesis validation problem in social sensing • A reliable hypothesis validation (RHV) framework to address two technical challenges (claim-hypothesis matching and hypothesis correctness validation) • Evaluation using real world social sensing data collected from Twitter feeds 29
Thank You! Social Sensing Lab University of Notre Dame http: //www 3. nd. edu/~sslab/ dwang 5@nd. edu 30
- Slides: 30