ALEXANDRU IOAN CUZA UNIVERSITATY OF IAI FACULTY OF

  • Slides: 33
Download presentation
“ALEXANDRU IOAN CUZA” UNIVERSITATY OF IAŞI FACULTY OF COMPUTER SCIENCE The Semantics and Pragmatics

“ALEXANDRU IOAN CUZA” UNIVERSITATY OF IAŞI FACULTY OF COMPUTER SCIENCE The Semantics and Pragmatics of Natural Language Daniela GÎFU daniela. gifu@info. uaic. ro /

SENTIMENT ANALYSIS – AN OVERVIEW

SENTIMENT ANALYSIS – AN OVERVIEW

What is Sentiment Analysis?

What is Sentiment Analysis?

IMPACT OF TOPIC Sentiment Analysis (SA) - one of the most current topics in

IMPACT OF TOPIC Sentiment Analysis (SA) - one of the most current topics in NLP. SA - offers possibility to monitor, to identify and understand in real time consumer's feelings and attitudes towards brands or topics in cyberspace and act accordingly. SA - very popular in social media. -Target: academia and industry.

08. 05. 2012

08. 05. 2012

IMPACT IN SOCIAL MEDIA Social media deals with the personal and social related opinion.

IMPACT IN SOCIAL MEDIA Social media deals with the personal and social related opinion. SA - very vital role in understanding the opinions from such conversation, posts, blogs, etc and deriving a sensible short summary consisting of most relevant opinions. SA - helps to: • Take quick decision • To change strategy and tactics used • To understand mood of the market • Be with the changing trends • To improve one’s product

VALIDATY OF S. A. - evaluated by comparing sentiment scores for specific comments to

VALIDATY OF S. A. - evaluated by comparing sentiment scores for specific comments to their respective star ratings, which are common clues used by individuals to filter what they read during information acquisition.

RESEARCH QUESTIONS. . . • How comparable are sentiment scores for reviews/comments to their

RESEARCH QUESTIONS. . . • How comparable are sentiment scores for reviews/comments to their respective star ratings? • How do sentiment scores impact decision outcomes?

PURPOSE AND MOTIVATION - to create a complete SOTA in SA, with a focus

PURPOSE AND MOTIVATION - to create a complete SOTA in SA, with a focus on social media posts. - to enhance the results of context-based SA. - to clarify the descriptive behavior of receptor, affected by the multitude of information on forums. - to improve the performance of SA classifiers based on two approaches (machine learning & lexicon).

CONTENT 1. Introduction 2. A general view on the subject 3. SA levels 3.

CONTENT 1. Introduction 2. A general view on the subject 3. SA levels 3. 1. SA at document level 3. 2. SA at clause/sentence level 3. 3. Features-based on SA 3. 4. Comparative sentiment analysis 3. 5. Sentiment lexicon acquisition 3. 6. Conclusions 4. Applications 4. 1. Business and government 4. 2. Review sites 4. 3. Other domains: politics and sociology 4. 4. Conclusions 5. Conclusions and discussions

2. A general view on the subject SA - the process of detecting the

2. A general view on the subject SA - the process of detecting the contextual polarity of text. SA – terminology: - subjectivity [Lyons 1981; Langacker 1985]; - evidentiality [Chafe and Nichols 1986]; - analysis of stance [Biber and Finegan 1988; Conrad and Biber 2000]; - affect [Batson, Shaw, and Oleson 1992]; - point of view [Wiebe 1994; Scheibman 2002]; - evaluation [Hunston and Thompson, 2001] - appraisal [Martin and White 2005]; - opinion mining [Pang and Lee 2008]; - politeness [Gîfu and Topor, 2014].

3. Sentiment classification techniques Fig. 1 Sentiment classification techniques

3. Sentiment classification techniques Fig. 1 Sentiment classification techniques

3. SA levels - document a) supervised approach Positive Negative Neutral Fig. 2 Supervised

3. SA levels - document a) supervised approach Positive Negative Neutral Fig. 2 Supervised learning – for three classes

3. SA levels - document a) supervised approach Fig. 2 Python NLTK Demos for

3. SA levels - document a) supervised approach Fig. 2 Python NLTK Demos for Natural Language Text Processing http: //text-processing. com/demo/

3. SA levels - document a) unsupervised approach Based on determining the semantic orientation

3. SA levels - document a) unsupervised approach Based on determining the semantic orientation (SO) of specific words/phrases. 1. Sentiment lexicon (words/expressions) – [Taboada et. al, 2011] 1. Set of predefined POS models – [Turney, 2002]

3. SA levels – clause/sentence More complex – identifying if a sentence is opinionated

3. SA levels – clause/sentence More complex – identifying if a sentence is opinionated and establishing the nature of opinion; - using supervised methods; 1. classifying clauses into two classes [Yu and Hatzivassiloglou, 2003] 2. an approach based on minimal reductions. [Pang and Lee, 2004] The problem: How can we classify the interrogations, sarcasm, metaphor, humor, etc. ?

3. SA levels – comparative -When a user doesn’t offer a direct opinion about

3. SA levels – comparative -When a user doesn’t offer a direct opinion about a product. [Jindal and Liu, 2006] Dacia Logan arată mult mai bine decât Dacia Solenza. - adverbial adjectives: mai mult, mai puţin (En. - more, less) - superlative adjectives and adverbs: mai, cel puţin (En. - more, at least) - additional clauses: decât, împotriva (En. - rather than, against). cover 98% of the comparative opinions

3. SA levels – features - more entities for each analyzed text or more

3. SA levels – features - more entities for each analyzed text or more attributes for each entity; - extraction of the attributes of an object; Becali a ajutat mult săracii 1/, [dar] nimeni nu a ştiut exact 2/ [cum] a făcut atâţia bani 3/. - extract and store all NPs; - keep only NPs with frequency above a learned-by-experiments threshold [Hu and Liu, 2004]

3. SA levels – sentiment lexicon a) manual approaches: Word. Net [Fellbaum, 1998], European

3. SA levels – sentiment lexicon a) manual approaches: Word. Net [Fellbaum, 1998], European Euro. Word. Net [Vossen, 1998], Balkanet [Tufiş et al. , 2004] Our work: Ana. Di. P-2010 inspired by LIWC-2007 [Pennebaker et al. , 2001]: 9 emotional classes. <classes> <class name="emotional" id="1"/> <class name="positive" id="2" parent="1"/> <class name="negative" id="3" parent="1"/> <class name="anxiety" id="4" parent="3"/> <class name="anger" id="5" parent="3"/> <class name="sadness" id="6" parent="3"/> <class name="spectacular" id="7" parent="2"/> <class name="firmness” id="8" parent="2"/> <class name="moderation" id="9" parent="2"/> </classes>

3. SA levels – sentiment lexicon Our software performs part-of-speech (POS) tagging and lemmatization

3. SA levels – sentiment lexicon Our software performs part-of-speech (POS) tagging and lemmatization of words. For example: <lexic name="Politic" lang="ro"> <word lemma="clevetitor" classes="1, 3, 6"/> <word lemma="genial" classes="1, 2, 7"/> … </lexic>

3. SA levels – sentiment lexicon a) corpus-based approaches – a set of words/phrases

3. SA levels – sentiment lexicon a) corpus-based approaches – a set of words/phrases extracted from a relatively small corpus is extended by using a large corpus of documents on a single domain. - a classical work [Hatzivassiloglou and Mc. Keown, 1997] using a set of linguistic connectors şi, sau, nici, fie (en. - and, or, not, either). Examples: bărbat puternic şi armonios / bărbat puternic şi armonios femeie senzuală sau inteligentă? / femeie sărmană sau înstărită? băiatul nu e nici prost, nici deștept. . . / băiatul nu e nici prost, nici urât. . .

3. SA demo I) Generic SA – for EN, different kind of texts https:

3. SA demo I) Generic SA – for EN, different kind of texts https: //app. monkeylearn. com/main/classifiers/cl_pi 3 C 7 Ji. L/ II) Tweet Sentiment – for EN https: //app. monkeylearn. com/main/classifiers/cl_qkjxv 9 Ly/ III) Product Sentiment – for EN, classifies product reviews and opinions in English as positive or negative https: //app. monkeylearn. com/main/classifiers/cl_TWm. MTdg. Q/ IV) Hotel Sentiment – for EN, distinguishes between good and bad hotel reviews https: //app. monkeylearn. com/main/classifiers/cl_r. Z 2 P 7 hbs/ V) Restaurant Sentiment – for EN, distinguishes between good and bad restaurant reviews https: //app. monkeylearn. com/main/classifiers/cl_Csf. Dyd 3 m/ https: //monkeylearn. com/sentiment-analysis/

3. SA demo VI) Movies Sentiment – for EN, distinguishes between good and bad

3. SA demo VI) Movies Sentiment – for EN, distinguishes between good and bad movies reviews https: //app. monkeylearn. com/main/classifiers/cl_MX 2 q. QKNi/ VII) Airlines Sentiment – for EN, distinguishes between good and bad tweets about airlines https: //app. monkeylearn. com/main/classifiers/cl_qkjxv 9 Ly/ https: //monkeylearn. com/sentiment-analysis/

4. Applications – business and government “Why aren’t consumers buying our laptop? ” when

4. Applications – business and government “Why aren’t consumers buying our laptop? ” when the price is good, and the weight is obviously in accord with consumer’s wishes. [Lee, 2004] Two kinds of answers: - the subjective reasons about intangible qualities (e. g. the physical keyboard is tacky) or - misperceptions (even though they are wrong) Solution: By tracking consumer’s opinions, one could realize trend prediction in sales, etc. [Mishne & Glance, 2006].

4. Applications – business and government Solution based on a dictionary + semantic role

4. Applications – business and government Solution based on a dictionary + semantic role of negations and pragmatic connectors: - classification of emotionally charged words into two classes: positive and negative (also a neutral class); - more classes, associating to each word with a value in the range -5 to +5; - [Gîfu and Cristea, 2012 a] a scale to the interval -3 to +3; - [Gîfu and Scutelnicu, 2013] a scale of values: -1 to +1.

4. Process phases: POS-tagger & NER & Anaphora Resolution <DOCUMENT> <P ID="1"> <S ID="1">

4. Process phases: POS-tagger & NER & Anaphora Resolution <DOCUMENT> <P ID="1"> <S ID="1"> <W EXTRA="Not. In. Dict" ID="11. 1" LEMMA="" MSD="Vmip 3 s" Mood="indicative" Number="singular" POS="VERB" Person="third" Tense="present" Type="predicative" offset="0"></W> <NP HEADID="11. 2" ID="0" ref="0"> <W Case="direct" Gender="masculine" ID="11. 2" LEMMA="nimic" MSD="Pz 3 msr" Number="singular" POS="PRONOUN" Person="third" Type="negative" offset="1">Nimic</W> <W ID="11. 3" LEMMA="mai" MSD="Rg" POS="ADVERB" offset="7"> mai</W> <W Case="direct" Definiteness="no" Gender="masculine" ID="11. 4" LEMMA="odios" MSD="Afpmsrn" Number="singular" POS="ADJECTIVE" offset="11"> odios</W> <W ID="11. 5" LEMMA=", " MSD="COMMA" POS="COMMA" offset="16">, </W> <W ID="11. 6" LEMMA="mai" MSD="Rg" POS="ADVERB" offset="18"> mai</W> <W ID="11. 7" LEMMA="oribil" MSD="Rg" POS="ADVERB" offset="22"> oribil</W> <W Case="direct" Definiteness="no" EXTRA="Not. In. Dict" Gender="masculine" ID="11. 8" LEMMA="decât" MSD="Afpmsrn" Number="singular" POS="ADJECTIVE" offset="29">decât</W> </NP> <NP HEADID="11. 9" ID="1" ref="1"> <W Case="direct" Definiteness="yes" Gender="masculine" ID="11. 9" LEMMA="pantof" MSD="Ncmpry" Number="plural" POS="NOUN" Type="common" offset="35"> pantofii</W> <NP HEADID="11. 10" ID="2" ref="2"> <W Case="direct" Definiteness="no" Gender="masculine" ID="11. 10" LEMMA="sport" MSD="Ncmsrn" Number="singular" POS="NOUN" Type="common" offset="44">sport</W> <W ID="11. 11" LEMMA="cu" MSD="Sp" POS="ADPOSITION" offset="50"> cu</W> <NP HEADID="11. 12" ID="3" re f="3"> <W Case="direct" Definiteness="yes" Gender="feminine" ID="11. 12" LEMMA="platformă" MSD="Ncfsry" Number="singular" POS="NOUN" Type="common" offset="53">platformă</W> </NP> </DOCUMENT>

4. Process phases: POS-tagger & NER & Anaphora Resolution Fig. 3 The interface of

4. Process phases: POS-tagger & NER & Anaphora Resolution Fig. 3 The interface of the EAT system

4. Applications – business and government - 46 rules for values. <rule> <word attribute=”LEMMA”

4. Applications – business and government - 46 rules for values. <rule> <word attribute=”LEMMA” value=”cel”/> <word attribute=”LEMMA” value=”mai”/> <word attribute=”POS“ value=”ADJECTIVE”/> </rule> Ex: cel mai bun <rule> <word attribute=”LEMMA” value=”cel”/> <word attribute=”LEMMA” value=”mai”/> <word attribute=”POS” value=”bun”/> </rule>

4. Applications – review sites - to appreciate the reviews and ratings about your

4. Applications – review sites - to appreciate the reviews and ratings about your company or yourself; - to summarize reviews. Our work: the consumer’s behaviour, civic identity [Gîfu et al. , 2013] 6 profiles: the-decent, the-porn-aggressive, the-incitator, theaffected, the-author-attacker and supporter. - we established a number of features (lexical, syntactic, semantic): style, emotional classes, etc.

4. Applications – politics/sociology Two dimensions in politics: 1. to know what electors are

4. Applications – politics/sociology Two dimensions in politics: 1. to know what electors are thinking about the political candidates [Efron, 2004, Goldberg et al. , 2007, Layer et al. , 2003, Mullen and Malouf, 2008]; 2. to clarify the politicians’ positions to enhance the quality of information that voters have access to [Bansal et al. , 2008, Gîfu, 2013 b] In sociology: - how ideas and innovations are propagated [Rosen, 1974] Ex: the polls on different issues

5. SA Tutorials for students in the humanities for students in computer science SA

5. SA Tutorials for students in the humanities for students in computer science SA with Python https: //www. youtube. com/watch? v=e 6 x. ZAISu-5 E Code: https: //github. com/jg-fisher/reddit. Sentiment SA APIs in Java Open. NLP - supports the most common NLP tasks: tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing, language detection and coreference resolution https: //opennlp. apache. org/ Stanford Core. NLP https: //stanfordnlp. github. io/Core. NLP/ SA with Natural Language Toolkit NLTK http: //www. nltk. org/ WEKA - a set of tools created for: data pre-processing, classification, regression, clustering, association rules, and visualization https: //www. cs. waikato. ac. nz/ml/weka/

CONCLUSIONS AND DISCUSSIONS SA - a complex task; SA - an emerging discipline with

CONCLUSIONS AND DISCUSSIONS SA - a complex task; SA - an emerging discipline with promising academic and, most important, industrial applications; . . the sentiment classification problem - more challenging Future work. . . - to develop an independent sentiment classifier using machine learning methods; - to compare the results obtained with machine learning to sentiment classification on traditional topic-based categorization; - to analyse the sentiment lexicon in old Romanian language in terms of diachronic semantics.

Thank you for your attention! ?

Thank you for your attention! ?