This talk focuses on demonstrating basic consumer survey
This talk focuses on demonstrating basic consumer survey analysis using the Text Analysis platform with JMP Pro. This talk also includes an example of an approach for analysis of text data (product review) that has an overall score (rating) to help with preparation of the data. Text curation (with help from the Genreg platform) A. Narayanan & Scott Reese P&G Data & Modeling Science Oct. 2019
Agenda • What’s the business question? • Process for exploration • Cleanup steps • Recoding • Stemming • Add Phrase • Stopwords (Genreg help) • Topic review • Compare back to comments for context • Key topic focus (Genreg help) • How it has been used and Final tips
• What are users talking about? What are the business questions? • What are themes related to higher and lower ratings? • How to prioritize themes?
Overall process • Get data into JMP • Analyze > Text Explorer • Data cleanup • Recoding • Stemming (achieved in this example via recoding) • Add Phrase • Stopwords (using Genreg) • Topic Exploration • Focus on most impactful topics (using Genreg) • Repeat from Data cleanup as needed
Challenges with cleanup • Bag of words approach: • JMP does not care about the meaning of the words. • Only counts the words. • Of course, how it counts matters (binary, frequency, log, TF-IDF). • Robustness of cleanup varies by user. • Due to available resources, category knowledge, business objective, e. g.
Recoding • Because JMP is counting terms, differences in spelling will result in being counted differently. • This counting does not consider words with similar ideas or synonyms. • Recode is a tool to modify how a word is counted. • Allows a word to be counted as another. Automatic grouping tool is very helpful, but requires oversight. Useful synonyms for specific category.
Comment on Recoding • The order of the recoding step is important. • There is an order of operations for terms and phrases that must be respected in order to function as intended. • Recode first • Add Phrases second • Recode Phrases as needed
Stopwords Most of the terms are not • For example, Hair, Product, Pomade • Left with Hold and Smell- primary useful for benefits we would expect to hear about. taking action or deciphering intent. Original approach:
Approach: Genreg to identify stopwords • Genreg used to help identify stopwords. • Estimation method options. • Current recommendation is Elastic Net. • Keep Adaptive checked. Still requires iteration… but much less.
Details: JMP screen sharing • Review parameter estimate table • • • Make into Datatable Recode terms (remove TF IDF) Sort by p-values Recode terms as necessary (to remove the TF IDF or Binary identifier) Copy list of terms beyond the desired p-value range • Go back to Text explorer window • Manage Stopwords • Paste in list of terms from parameter estimate table • Save script with new name
Starting number of topics • Create new DTM with cleaned up term list • Run Latent Semantic Analysis, SVD • Review Singular Values table • Target ~70% of Cum Percent • This will be MANY topics • Run Topic Analysis SVD • Enter the Target number of topics • Save Document Topic Vectors
Genreg to focus on key topics • Number of Topics will likely be higher than practical for immediate business action • Run Genreg • Target variable = Rating (continuous) • Model effects = Document Topic Vectors • Turn on Profiler • Red Triangle > Assess variable Importance • Prediction Profiler Red Triangle > Colorize Profiler and Reorder
Review identified topics • Sample screenshot: • Review topics • Can you name them? • Do they make sense? • Is it actionable? • Do you need to conduct another round of cleanup?
Focus action on identified and reviewed topics • Examples of use: • Identification of attribute areas needing further evaluation. • Alignment on language used by consumers. • Prioritization of topic areas.
Final tips: Save various versions of your TE script: Save your file before running Genreg! With informative titles (Recode completed, Phrases added, Genreg Stopwords…) Can become memory intensive…. not fun to start over with an analysis. Recognize that this approach is just ONE APPROACH. Exploration should be a team sport and multiple viewpoints are often helpful. Keep in mind the big picture. This is still part and part stats and JMP Text Explorer is a HUGE improvement to previous approaches.
Questions?
- Slides: 16