I Hate You Computer How Machine Learning is
- Slides: 50
I Hate You, Computer ; ) How Machine Learning is Transforming Sentiment Analysis & NLP Katharine Jarmul
Hello! I am Katharine Jarmul I love working with data in Python. Lots of data is text. : . I love working with text in Python… You can find me at: @kjam http: //kjamistan. com
1. I am not a Machine Learning Expert I don’t have a Ph. D (in anything). I didn’t study astronomy or physics. I can’t derive equations on a board at will for you. I am *just* a Python developer.
2. Assumptions I’ve Made About You understand basics of Machine Learning. You have used nltk or other libraries to perform Natural Language Processing. You are interested in what’s new / how to do more.
What We’ll Cover ◎Sentiment Analysis Fundamentals ◎Problems in Sentiment Analysis ◎Gensim, spa. Cy, Tensor. Flow, Theano ◎IBM’s Watson: Getting Emotional ◎New & Old Research ◎What’s Next?
What We’ll Won’t Cover ◎The python library to download and determine sentiment automagically
3. Sentiment Analysis 101 What is sentiment analysis? How do we go about it in Python?
What is Sentiment Analysis RLLY?
How is Sentiment Analysis Used? ◎Brand, Politics, Engagement Analysis ◎User Satisfaction ◎Recommendation Systems ◎Targeted / Retargeted Marketing ◎Anomaly Detection ◎QA & User Experience
Basic Sentiment Analysis Steps Choosing or Building a Lexicon, Labeling Choosing Algorithm, Model Architecture Sentiment Analysis Process Choosing Parser, Preprocessing Data, Normalization / Standardization Testing, Evaluation and Improvement
4. Choosing or Creating A Lexicon What do we need to have a good lexicon? Can we use lexicons from other fields?
Autotagging a Lexicon (Distance Supervision) Go, Bhayani and Huang, 2009
Choosing Labels: What do you want to show? ◎Positive, Negative (1, 0) ◎Positive, Negative, Neutral (1, -1, 0) ◎Positive, Negative, Neutral, Indeterminate ◎Positive → Negative (scale: -1, 1) ◎Positive → Negative (without normalized scale) ◎Categorizing by Emotion ◎Stance with Entity / Topic Detection
Tagging a Lexicon ◎Best-Worst ○ Choose the most positive, most negative: ◉ #boring, �� , delicious, �� ◎Sliding Scale ○ Rate this word / sentence / n-gram on a scale: ◉ Completely Negative - Neutral Positive - Completely Positive ◎Simple boolean ○ Negative / Positive (occasionally also Neutral)
Finding Sentiment Agreement: Least Perceptible Difference Kiritchenko and Mohammad, 2016
Social. Sent: Learning New Sentiment and Lexicon via Unlabeled Corpora Hamilton, Clark, Leskovec and Jurafsky, 2016 http: //nlp. stanford. edu/projects/socialsent/
Social. Sent: r/programming + - gold minecraft posix 200 nginx hacks spaghetti messy python: -0. 12 Std Dev: 1. 57 Hamilton, Clark, Leskovec and Jurafsky, 2016 http: //nlp. stanford. edu/projects/socialsent/
5. Choosing An Approach / Algorithm How do we determine positive or negative? What machine learning systems can we use?
Sentiment Analysis ML Approaches / Models ◎Bag of Words (BOW) ◎Term Frequency-Inverse Document Frequency (TF-IDF) ◎Word embedding aggregations / BOW ◎Doc 2 vec with tagging ◎MV-RNTN - Sentence trees ◎CNN / RNN with Word (or Doc) Embeddings ◎Semisupervised Learning ◎Finely Tuned “state-of-the-art” systems
Sentiment Analysis Approaches Oldies Newbies Bag of Words Word Embeddings TF-IDF Doc 2 vec unigrams skip-grams / dependencies supervised semisupervised
State of the Art Sentiment Features Tang, Wei, Qin, Liu and Zhou, 2014 (orig. Mohammad et al. , 2013)
Gensim and Word 2 Vec / Doc 2 Vec ◎What is Word 2 vec? Doc 2 vec? ◎Skip-Gram or Continuous Bag of Words? ◎Word Embeddings: A mathematical basis for natural language processing (moving beyond the counts) ◎Gensim library: Helping Python Developers with Word Embeddings since 2010 ◎Glove-python available for Glo. Ve implementation
Word Embeddings: Different Methods, Different Results Levy & Goldberg, 2014
Online Comparison: Word Embedding Methods http: //irsrv 2. cs. biu. ac. il: 9998/? word=python
Word Embeddings. . . Not Neutral!! ◎King - Man + Woman = Queen ◎Doctor - Man + Woman = Gynecologist / Nurse ◎Professor - Man + Woman = Asst. Professor ◎Professor - Woman + Man = Professor Emeritus ◎Try searching most similar for: Mexicans, immigrants, Negroes (yes, it exists), Asian and Jews with topn=30 (Bolukbasi, Chang, Zou, Saligrama and Kalai, 2016)
Gensim with Labels: Word 2 Vec Great examples: ◎ http: //linanqiu. github. io/2015/10/07/word 2 vec-sentiment/ ◎ https: //districtdatalabs. silvrback. com/modern-methods-forsentiment-analysis
Gensim with Labels: Doc 2 Vec Labeled Sentences Great examples: ◎ http: //linanqiu. github. io/2015/10/07/word 2 vec-sentiment/ ◎ https: //districtdatalabs. silvrback. com/modern-methods-forsentiment-analysis
Choosing n-gram length (RNTN) Socher, Perelygin, Wu, Chuang, Manning, Ng and Potts, 2012
LSTM with Theano / Tensor. Flow Deep. Learning 4 J
CNN: Moving from Images to Text http: //www. wildml. com/2015/12/implementing-a-cnn-for-text-classification-in-tensorflow/
Deep Learning: Labeled Text Great examples: ◎ http: //deeplearning. net/tutorial/lstm. html ◎ https: //github. com/jpmcd/Tensorflow. Sentiment ◎ https: //github. com/dennybritz/cnn-text-classification-tf
Semisupervised processes with Distance Supervision Lexicon / Autotagging / Input Simple Classifier Deep Learning System (RNN, CNN, ANN)
6. Regularization, Preprocessing and Parsing To tag or not to tag? To preprocess or not to preprocess?
Preprocessing is essential (or is it? ) ◎Lemmatization ◎Removing “stop” words, rare words ◎Removing duplicate punctuation/letters, or creating symbols to represent groups (cooll rather than every variation of coollllll, etc) ◎Spellcheck ◎Caution: try not preprocessing or minimal as baseline, your processing *may* harm your sentiment model
Sense 2 Vec and spa. Cy ◎Google_ORG vs. Google_VERB ◎Large Reddit Corpus
6. IBM Watson: Getting Emotional Can emotional tagging show us more ranges of sentiment?
Varied Emotions, Varied Sentiment
IBM Watson: Analyzing Emotion & Tone ◎https: //tone-analyzerdemo. mybluemix. net/ ◎Uses AI to categorize: ○ Anger ○ Joy ○ Fear ○ Distrust ○ Sadness ◎Social Tendencies and Language Style analysis also offered
6. What’s Unsolved? Undetermined? What does the latest research say? What hasn’t been touched upon?
Humo(u)r
Negation (uni/bigram) Zhu, Guo, Mohammad and Kiritchenko, 2014
Stanford Sentiment Trees: Negation Socher, Perelygin, Wu, Chuang, Manning, Ng and Potts, 2012
What’s (Still) Hard? ◎Negation ◎Sarcasm & Irony ◎Mixed Sentiment ◎Complex Emotions ◎Colloquialism, Slang, New / Old Phrases ◎Speaker Intention / Personality ◎General NLP Issues: ○ Misspellings, Acronyms, Hashtags, Lolspeak, Choosing Context, Never-Before Seen
Cultural References
Speaking via Images and GIFs
Densify: Are Dense Word Embeddings A Solution for Speed? Rothe, Ebert and Schütze, 2016
Mo’ Solutions, Mo’ Problems ◎Deep Learning at Scale and Speed ◎Integration of different social media into input streams ○ Different Lexicon? Corpora? Embeddings? ◎Advanced Word Embeddings / Lexicon: ○ Aggregation Points: By Buyer Location, By Demographic ◎Irony and Mixed Sentiment Detection ◎Ensemble Methods ◎Character-level Embeddings
TL; DR: Have we “solved” Sentiment Analysis? ◎Short answer: Not completely. ◎Longer answer: It depends on what you are trying to do, how you define sentiment analysis and how much data you have (both in your Lexicon and your active data for testing). Deep Learning may unlock some of the nuances and stumbling blocks!
Thanks! Any questions? Reach out! @kjam katharine@kjamistan. com Presentation template by Slides. Carnival
- Tf-idf
- Homework oh homework
- Homework oh homework i hate you you stink
- Homework oh homework
- Homework oh homework i hate you you stink
- Homework oh homework i hate you you stink
- You who love the lord hate evil
- Most kids hate learning gcse energy names
- Ray creates an energy transfer diagram for a hair dryer
- Concept learning task in machine learning
- Analytical learning in machine learning
- Pac learning model in machine learning
- Machine learning t mitchell
- Inductive and analytical learning in machine learning
- Combining inductive and analytical learning
- Instance based learning in machine learning
- Inductive learning machine learning
- First order rule learning in machine learning
- Eager classification versus lazy classification
- Cmu machine learning
- I hate careless people thats why i like you
- Daisy’s devine day
- 10 things i hate about you kat's poem
- 10 things inhate about you
- Marvel not if the world hate you
- 10 things i hate about you speech
- The taming of the shrew 10 things i hate about you
- Cuadro comparativo e-learning m-learning b-learning
- How does sociology interpret hate crimes
- Theme of love in romeo and juliet
- Pyramid of hate
- School is cool poem
- Misodoctakleidist hate
- Acrostic poem happy
- I hate everyone little miss sunshine
- Love and hate poem
- I hate falling mnemonic
- Why does janie hate her grandmother so much?
- Somali pirate jokes
- Hate similes
- Why do sailors hate doldrums
- Carol went shopping yesterday
- Why does tybalt hate the montagues
- Othello act 1 analysis
- I hate module
- Beowulf essential questions
- I hate frameworks
- When was angie thomas born
- Disability hate crime campaign
- People really hate elephants on compact cars
- Othello act 1 summary