Opinion Mining and Sentiment Analysis via Divide Conquer

  • Slides: 43
Download presentation
Opinion Mining and Sentiment Analysis via Divide & Conquer Bing Liu University Of Illinois

Opinion Mining and Sentiment Analysis via Divide & Conquer Bing Liu University Of Illinois at Chicago liub@cs. uic. edu

Introduction n Two main types of textual information. q n n Most current text

Introduction n Two main types of textual information. q n n Most current text information processing methods (e. g. , web search, text mining) work with factual information. Sentiment analysis or opinion mining q n Facts and Opinions computational study of opinions, sentiments and emotions expressed in text. Why opinion mining now? Mainly because of the Web; huge volumes of opinionated text. Bing Liu, Workshop on Mining User-Generated Content, SMU, August 8, 2009

Introduction – user-generated media n Importance of opinions: q q Opinions are so important

Introduction – user-generated media n Importance of opinions: q q Opinions are so important that whenever we need to make a decision, we want to hear others’ opinions. In the past, n n n Individuals: opinions from friends and family businesses: surveys, focus groups, consultants … Word-of-mouth on the Web q q User-generated media: One can express opinions on anything in reviews, forums, discussion groups, blogs. . . Opinions of global scale: No longer limited to: n n Individuals: one’s circle of friends Businesses: Small scale surveys, tiny focus groups, etc. Bing Liu, Workshop on Mining User-Generated Content, SMU, August 8, 2009

Introduction - opinions n n n Opinions are usually subjective expressions that describe people’s

Introduction - opinions n n n Opinions are usually subjective expressions that describe people’s sentiments, appraisals or feelings toward entities, events and their properties. The concept of the word opinion is very broad. In this talk, we only focus on opinion expressions that convey people’s positive or negative sentiments related to appraisals or evaluations. Bing Liu, Workshop on Mining User-Generated Content, SMU, August 8, 2009

An Example Review n “I bought an i. Phone a few days ago. It

An Example Review n “I bought an i. Phone a few days ago. It was such a nice phone. The touch screen was really cool. The voice quality was clear too. Although the battery life was not long, that is ok for me. However, my mother was mad with me as I did not tell her before I bought the phone. She also thought the phone was too expensive, and wanted me to return it to the shop. …” n What do we see? q Opinions, targets of opinions, and opinion holders Bing Liu, Workshop on Mining User-Generated Content, SMU, August 8, 2009

Target Object (Liu, Web Data Mining book, 2006) n Definition (object): An object o

Target Object (Liu, Web Data Mining book, 2006) n Definition (object): An object o is a product, person, event, organization, or topic. o is represented as q q n n a hierarchy of components, sub-components, and so on. Each node represents a component and is associated with a set of attributes of the component. An opinion can be expressed on any node or attribute of the node. To simplify our discussion, we use the term features to represent both components and attributes. Bing Liu, Workshop on Mining User-Generated Content, SMU, August 8, 2009

What is an Opinion? (Liu, a Ch. in NLP handbook) n An opinion is

What is an Opinion? (Liu, a Ch. in NLP handbook) n An opinion is a quintuple (oj, fjk, soijkl, hi, tl), where q q q oj is a target object. fjk is a feature of the object oj. soijkl is the sentiment value of the opinion holder hi on feature fjk of object oj at time tl. soijkl is +ve, -ve, or neu, or a more granular rating. hi is an opinion holder. tl is the time when the opinion is expressed. Bing Liu, Workshop on Mining User-Generated Content, SMU, August 8, 2009

Objective – structure the unstructured n Objective: Given an opinionated document, q Discover all

Objective – structure the unstructured n Objective: Given an opinionated document, q Discover all quintuples (oj, fjk, soijkl, hi, tl), n q n i. e. , mine the five corresponding pieces of information in each quintuple, and Or, solve some simpler problems With the quintuples, q Unstructured Text Structured Data n n Traditional data and visualization tools can be used to slice, dice and visualize the results in all kinds of ways Enable qualitative and quantitative analysis. Bing Liu, Workshop on Mining User-Generated Content, SMU, August 8, 2009

Sentiment Classification: doc-level (Pang and Lee, Survey, 2008) n Classify a document (e. g.

Sentiment Classification: doc-level (Pang and Lee, Survey, 2008) n Classify a document (e. g. , a review) based on the overall sentiment expressed by opinion holder q Classes: Positive, or negative n Assumption: each document focuses on a single object and contains opinions from a single op. holder. n E. g. , thumbs-up or thumbs-down? q “I bought an i. Phone a few days ago. It was such a nice phone. The touch screen was really cool. The voice quality was clear too. Although the battery life was not long, that is ok for me. However, my mother was mad with me as I did not tell her before I bought the phone. She also thought the phone was too expensive, and wanted me to return it to the shop. …” Bing Liu, Workshop on Mining User-Generated Content, SMU, August 8, 2009

Subjectivity Analysis: sent. -level (Wiebe et al 2004) n Sentence-level sentiment analysis has two

Subjectivity Analysis: sent. -level (Wiebe et al 2004) n Sentence-level sentiment analysis has two tasks: q Subjectivity classification: Subjective or objective. n n q Sentiment classification: For subjective sentences or clauses, classify positive or negative. n n Objective: e. g. , I bought an i. Phone a few days ago. Subjective: e. g. , It is such a nice phone. Positive: It is such a nice phone. But (Liu, a Ch in NLP handbook) q subjective sentences ≠ +ve or –ve opinions n q E. g. , I think he came yesterday. Objective sentence ≠ no opinion n Imply –ve opinion: The phone broke in two days Bing Liu, Workshop on Mining User-Generated Content, SMU, August 8, 2009

Feature-Based Sentiment Analysis n Sentiment classification at both document and sentence (or clause) levels

Feature-Based Sentiment Analysis n Sentiment classification at both document and sentence (or clause) levels are not enough, q q q they do not tell what people like and/or dislike A positive opinion on an object does not mean that the opinion holder likes everything. An negative opinion on an object does not mean …. . n Objective (recall): Discovering all quintuples (oj, fjk, soijkl, hi, tl) n With all quintuples, all kinds of analyses become possible. Bing Liu, Workshop on Mining User-Generated Content, SMU, August 8, 2009

Feature-Based Opinion Summary (Hu & Liu, KDD-2004) “I bought an i. Phone a few

Feature-Based Opinion Summary (Hu & Liu, KDD-2004) “I bought an i. Phone a few days ago. It was such a nice phone. The touch screen was really cool. The voice quality was clear too. Although the battery life was not long, that is ok for me. However, my mother was mad with me as I did not tell her before I bought the phone. She also thought the phone was too expensive, and wanted me to return it to the shop. …” …. Feature Based Summary: Feature 1: Touch screen Positive: 212 n The touch screen was really cool. n The touch screen was so easy to use and can do amazing things. … Negative: 6 n The screen is easily scratched. n I have a lot of difficulty in removing finger marks from the touch screen. … Feature 2: battery life … Bing Liu, Workshop on Mining User-Generated Content, SMU, August 8, 2009 Note: We omit opinion holders

n Visual Comparison (Liu et al. WWW-2005) + Summary of reviews of Phone 1

n Visual Comparison (Liu et al. WWW-2005) + Summary of reviews of Phone 1 Cell _ Voice n Comparison of reviews of Screen Battery + Cell Phone 1 Cell Phone 2 _ Bing Liu, Workshop on Mining User-Generated Content, SMU, August 8, 2009 Size Weight

Feat. -based opinion summary in Bing Liu, Workshop on Mining User-Generated Content, SMU, August

Feat. -based opinion summary in Bing Liu, Workshop on Mining User-Generated Content, SMU, August 8, 2009

Sentiment Analysis is Hard! n “This past Saturday, I bought a Nokia phone and

Sentiment Analysis is Hard! n “This past Saturday, I bought a Nokia phone and my girlfriend bought a Motorola phone with Bluetooth. We called each other when we got home. The voice on my phone was not so clear, worse than my previous phone. The battery life was long. My girlfriend was quite happy with her phone. I wanted a phone with good sound quality. So my purchase was a real disappointment. I returned the phone yesterday. ” Bing Liu, Workshop on Mining User-Generated Content, SMU, August 8, 2009

Senti. Analy. is not Just ONE Problem n (oj, fjk, soijkl, hi, tl), q

Senti. Analy. is not Just ONE Problem n (oj, fjk, soijkl, hi, tl), q q q n n n oj - a target object: Named Entity Extraction (more) fjk - a feature of oj: Information Extraction soijkl is sentiment: Sentiment determination hi is an opinion holder: Information/Data Extraction tl is the time: Data Extraction Coreference resolution Relation extraction Word sense disambiguation Synonym match (voice = sound quality) … None of them is a solved problem! Bing Liu, Workshop on Mining User-Generated Content, SMU, August 8, 2009

Accuracy is Still an Issue! n Some commercial solutions give clients several example opinions

Accuracy is Still an Issue! n Some commercial solutions give clients several example opinions in their reports. q n Why not all? Accuracy could be the problem. Accuracy: both q q q Precision: how accurate is the discovered opinions? Recall: how much is left undiscovered? Which sentence is more interesting? (cordless phone review) n n (1) The voice quality is great. (2) I put the base in the kitchen, and I can hear clearly from the handset in the bed room, which is very far. Bing Liu, Workshop on Mining User-Generated Content, SMU, August 8, 2009

Easier and Harder Problems n Reviews are easier. q n Forum discussions and blogs

Easier and Harder Problems n Reviews are easier. q n Forum discussions and blogs are harder. q n n n Objects/entities are given (almost), and little noise Objects are not given, and a large amount of noise Determining sentiments seems to be easier. Determining objects and their corresponding features is harder. Combining them is even harder. Bing Liu, Workshop on Mining User-Generated Content, SMU, August 8, 2009

Manual to Automation n Ideally, we want an automated solution that can scale up.

Manual to Automation n Ideally, we want an automated solution that can scale up. q q Type an object name and then get +ve and –ve opinions in a summarized form. Unfortunately, that will not happen any time soon. Manual ----------|---- Full Automation q Some real creativity is needed to produce a scalable and accurate solution. Bing Liu, Workshop on Mining User-Generated Content, SMU, August 8, 2009

I am Optimistic n Significant research is going on in several academic communities, q

I am Optimistic n Significant research is going on in several academic communities, q q NLP, Web, data mining, information retrieval, … New ideas and techniques are coming all the time. n Industry is also trying different strategies, and solving some useful aspects of the problem. n I believe a reasonably accurate solution will be out in the next few years. q Use a combination of algorithms. Bing Liu, Workshop on Mining User-Generated Content, SMU, August 8, 2009

Problem Matrix (in OM/SA context) n Horizontally: q q q n Sentiment determination Named

Problem Matrix (in OM/SA context) n Horizontally: q q q n Sentiment determination Named entity extraction (more) Relation extraction Word sense disambiguation Coreference resolution … Vertically q Conditional sentences Comparative sentences Sentences with negations Sentences with contrary words q … q q q Bing Liu, Workshop on Mining User-Generated Content, SMU, August 8, 2009

Divide and Conquer n n Most of the existing approaches try to solve the

Divide and Conquer n n Most of the existing approaches try to solve the whole problem (i. e. , all sub-problems). But there is unlike to be an one-technique-fitall solution. q q n Horizontal problems are all very different. Different vertical sentences can have very different semantic meanings. Let us look at some vertical cases. (Horizontal problems are more well known to NLP). Bing Liu, Workshop on Mining User-Generated Content, SMU, August 8, 2009

Some example types of sentences n Normal sentences with opinions: sentiment expressions on some

Some example types of sentences n Normal sentences with opinions: sentiment expressions on some target objects, e. g. , products, events, topics, persons. q n Conditional sentences with opinions: Sentences with conditions and consequents: q n E. g. , “the picture quality of this camera is great. ” E. g. , If you like this Nokia phone, just buy it. Comparative sentences with opinions: Comparisons expressing similarities or differences of more than one object. Usually stating an ordering or preference. q E. g. , “car x is cheaper than car y. ” Bing Liu, Workshop on Mining User-Generated Content, SMU, August 8, 2009

Conditional sentences (EMNLP-2009) Example 1: If someone makes a beautiful and reliable car, I

Conditional sentences (EMNLP-2009) Example 1: If someone makes a beautiful and reliable car, I will buy it n Example 2: If your Nokia phone is not good, buy this great Samsung phone n Example 3: If you are looking for a phone with good voice quality, don’t buy this Nokia phone n Example 4: If you want a good Nokia phone, go and read Amazon reviews. n Bing Liu, Workshop on Mining User-Generated Content, SMU, August 8, 2009

Motivation n n Are there enough conditionals to warrant a focused study ? How

Motivation n n Are there enough conditionals to warrant a focused study ? How easy/hard is it to integrate these of Conditional Sentences analysis techniques. Percentage with other sentiment systems ? 12 9, 9 10 8, 6 8, 1 8, 3 Audio Systems Medicine 8 6 5 4 2 0 Cellphone Automobile LCD TV Bing Liu, Workshop on Mining User-Generated Content, SMU, August 8, 2009

Conditional Connectives n n Conditional sentences contain a condition clause and consequent clause that

Conditional Connectives n n Conditional sentences contain a condition clause and consequent clause that are dependent on each other English language contains a number of conditional connectives If is the most commonly use conditional connective. Most conditionals can be logically Table of expressed as showing “If P thendistribution Q” conditional include connectives Other connectives only if, unless, even if, provided that, as long as etc. Bing Liu, Workshop on Mining User-Generated Content, SMU, August 8, 2009

Types of Conditional Sentences n n Extensive study in linguistics on conditionals has resulted

Types of Conditional Sentences n n Extensive study in linguistics on conditionals has resulted in a number of classification systems Most of these are based on semantic understanding Canonical tense patterns Simple classification scheme which can be recognized using a POS tagger Bing Liu, Workshop on Mining User-Generated Content, SMU, August 8, 2009

Types of Conditional Sentences 0 conditional: If you heat water, it boils Simple Present

Types of Conditional Sentences 0 conditional: If you heat water, it boils Simple Present / Simple Present n 1 st conditional: If the acceleration is good, I will buy it. Simple Present/ (Modal) Present|Past n 2 nd conditional: If the cell phone was robust, I would consider buying it. Past subjunctive/ (Modal) Verb n 3 rd conditional: If I had bought the a 767, I would have hated it. Past Perfect/ Present Perfect n Bing Liu, Workshop on Mining User-Generated Content, SMU, August 8, 2009

Rules for identifying Conditionals Ø Great car if you need powerful acceleration ⇒ It

Rules for identifying Conditionals Ø Great car if you need powerful acceleration ⇒ It is a great car if you need power acceleration Ø Add default rules to increase coverage If condition contains VB/VBP/VBZ → 0 conditional If consequent contains VB/VBP/VBS → 0 conditional If condition contains VBG → 1 st conditional If condition contains VBD → 2 nd conditional If conditional contains VBN → 3 rd conditional Bing Liu, Workshop on Mining User-Generated Content, SMU, August 8, 2009

Data features used for leaning n n n n n Sentiment words/phrases and their

Data features used for leaning n n n n n Sentiment words/phrases and their locations POS tags of sentiment words Words indicating no opinion Tense Patterns Special characters Conditional Connectives Length of Condition/Consequent Clauses Negation words …. Bing Liu, Workshop on Mining User-Generated Content, SMU, August 8, 2009

Classification Strategies n Clause-based classifier n Consequent classifier n Whole-sentence based classifier Bing Liu,

Classification Strategies n Clause-based classifier n Consequent classifier n Whole-sentence based classifier Bing Liu, Workshop on Mining User-Generated Content, SMU, August 8, 2009

Clause-based classifier n n Condition classifier + Consequent classifier Annotate condition and consequent clauses

Clause-based classifier n n Condition classifier + Consequent classifier Annotate condition and consequent clauses separately Combine these both to perform topic-based classification If topic is in condition clause, use Conditional classifier , otherwise use Consequent classifier Bing Liu, Workshop on Mining User-Generated Content, SMU, August 8, 2009

Consequent-based classifier n n n It is seen that, in many cases, the condition

Consequent-based classifier n n n It is seen that, in many cases, the condition clause contains no opinion, whereas the consequent clause reflects the sentiment of the entire sentence. Use consequent classifier only If the consequent is positive, all topics in the sentence are classified as positive Bing Liu, Workshop on Mining User-Generated Content, SMU, August 8, 2009

Whole-sentence-based classifier n n n Predict the opinion of each topic in a sentence

Whole-sentence-based classifier n n n Predict the opinion of each topic in a sentence separately Uses Topic Location and Opinion weight features discussed earlier, in addition to the normal features Each instance in the training/test set represents a topic Bing Liu, Workshop on Mining User-Generated Content, SMU, August 8, 2009

Comparative Sentences (Jindal and Liu, 2006) n Gradable q Non-Equal Gradable: Relations of the

Comparative Sentences (Jindal and Liu, 2006) n Gradable q Non-Equal Gradable: Relations of the type greater or less than n q Equative: Relations of the type equal to n q Ex: “optics of camera A is better than that of camera B” Ex: “camera A and camera B both come in 7 MP” Superlative: Relations of the type greater or less than all others n Ex: “camera A is the cheapest camera available in market” Bing Liu, Workshop on Mining User-Generated Content, SMU, August 8, 2009

Mining Comparative Opinions n Objective: Given an opinionated document d, . Extract comparative opinions:

Mining Comparative Opinions n Objective: Given an opinionated document d, . Extract comparative opinions: (O 1, O 2, F, po, h, t), where O 1 and O 2 are the object sets being compared based on their shared features F, po is the preferred object set of the opinion holder h, and t is the time when the comparative opinion is expressed. n Note: not positive or negative opinions. Bing Liu, Workshop on Mining User-Generated Content, SMU, August 8, 2009

An example n Consider the comparative sentence q q n The extracted comparative opinion

An example n Consider the comparative sentence q q n The extracted comparative opinion is: q n “Canon’s optics is better than those of Sony and Nikon. ” written by John on May 1, 2009. ({Canon}, {Sony, Nikon}, {optics}, preferred: {Canon}, John, May-1 -2009). Note: q Some horizontal problems are somewhat constrained in this context. Bing Liu, Workshop on Mining User-Generated Content, SMU, August 8, 2009

Opinion Spam Detection (Jindal and Liu, 2007) n n Fake/untruthful reviews: n Write undeserving

Opinion Spam Detection (Jindal and Liu, 2007) n n Fake/untruthful reviews: n Write undeserving positive reviews for some target objects in order to promote them. n Write unfair or malicious negative reviews for some target objects to damage their reputations. Increasing number of customers wary of fake reviews (biased reviews, paid reviews) Bing Liu, Workshop on Mining User-Generated Content, SMU, August 8, 2009

An Example of Practice of Review Belkin International, Inc Spam Top networking and peripherals

An Example of Practice of Review Belkin International, Inc Spam Top networking and peripherals manufacturer | Sales ~ $500 million in 2008 n n Posted an ad for writing fake reviews on amazon. com (65 cents per review) Jan 2009 Bing Liu, Workshop on Mining User-Generated Content, SMU, August 8, 2009

Experiments with Amazon Reviews n June 2006 q n 5. 8 mil reviews, 1.

Experiments with Amazon Reviews n June 2006 q n 5. 8 mil reviews, 1. 2 mil products and 2. 1 mil reviewers. A review has 8 parts n n <Product ID> <Reviewer ID> <Rating> <Date> <Review Title> <Review Body> <Number of Helpful feedbacks> <Number of Feedbacks> <Number of Helpful Feedbacks> Industry manufactured products “m. Products” e. g. electronics, computers, accessories, etc q 228 K reviews, 36 K products and 165 K reviewers. Bing Liu, Workshop on Mining User-Generated Content, SMU, August 8, 2009

Some Tentative Results n n Negative outlier reviews tend to be heavily spammed. Those

Some Tentative Results n n Negative outlier reviews tend to be heavily spammed. Those reviews that are the only reviews of some products are likely to be spammed Top-ranked reviewers are more likely to be spammers Spam reviews can get good helpful feedbacks and non-spam reviews can get bad feedbacks Bing Liu, Workshop on Mining User-Generated Content, SMU, August 8, 2009

Summary n n We briefly defined and introduced the opinion mining problem. We need

Summary n n We briefly defined and introduced the opinion mining problem. We need a divide and conquer approach. E. g. , q q n n n Conditional sentences Comparative sentences Opinion spam detection: fake reviews. Technical challenges are still huge. But I am optimistic. Accurate solutions will be out in the next few years. Bing Liu, Workshop on Mining User-Generated Content, SMU, August 8, 2009

References n B. Liu, “Sentiment Analysis and Subjectivity. ” A Chapter in Handbook of

References n B. Liu, “Sentiment Analysis and Subjectivity. ” A Chapter in Handbook of Natural Language Processing, 2 nd Edition, 2010. q n (An earlier version) B. Liu, “Opinion Mining”, A Chapter in the book: Web Data Mining, Springer, 2006. B. Pang and L. Lee, “Opinion Mining and Sentiment Analysis. ” Foundations and Trends in Information Retrieval 2(1 -2), 2008. Bing Liu, Workshop on Mining User-Generated Content, SMU, August 8, 2009