Text Analytics World Current Applications and Future Directions

  • Slides: 26
Download presentation
Text Analytics World Current Applications and Future Directions of Text Analytics Tom Reamy Chief

Text Analytics World Current Applications and Future Directions of Text Analytics Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional Services http: //www. kapsgroup. com

Agenda § Introduction: Current State of Text Analytics – Survey / Discussion Themes Enterprise

Agenda § Introduction: Current State of Text Analytics – Survey / Discussion Themes Enterprise Text Analytics - Search – still fundamental – Shift from information to business Social Media – Next Generation – Text Analytics and CRM Integration – Text and Data, Enterprise and Social Future of Text Analytics – Roadblocks, Deep Vision – § § § Questions 2

Introduction: KAPS Group § Knowledge Architecture Professional Services – Network of Consultants § Applied

Introduction: KAPS Group § Knowledge Architecture Professional Services – Network of Consultants § Applied Theory – Faceted taxonomies, complexity theory, natural categories, emotion taxonomies § Services: – Strategy – IM & KM - Text Analytics, Social Media, Integration – Taxonomy/Text Analytics development, consulting, customization – Text Analytics Quick Start – Audit, Evaluation, Pilot – Social Media: Text based applications – design & development § Partners – SAS, Smart Logic, Expert Systems, SAP, IBM, FAST, Concept Searching, Attensity, Clarabridge, Lexalytics § Projects – Portals, taxonomy, Text analytics – news, expertise location, information strategy, text analytics evaluation, Quick Start in Text A. § Clients: Genentech, Novartis, Northwestern Mutual Life, Financial Times, Hyatt, Home Depot, Harvard Business Library, British Parliament, Battelle, Amdocs, FDA, GAO, World Bank, etc. 3 § Presentations, Articles, White Papers – www. kapsgroup. com

Text Analytics World Current State of Text Analytics § History – academic research, focus

Text Analytics World Current State of Text Analytics § History – academic research, focus on NLP § Inxight –out of Zerox Parc – Moved TA from academic and NLP to auto-categorization, entity extraction, and Search-Meta Data § Explosion of companies – many based on Inxight extraction with some analytical-visualization front ends – § § § Half from 2008 are gone - Lucky ones got bought Early applications – News aggregation and Enterprise Search – Second Wave = shift to sentiment analysis Enterprise search – 30 -50% of market ($1 Bil) Text Analytics is growing 20% a year, 10% of analytics Fragmented market – no clear leader 4

Text Analytics World Current State of Text Analytics: Vendor Space § Taxonomy Management –

Text Analytics World Current State of Text Analytics: Vendor Space § Taxonomy Management – Schema. Logic, Pool Party § From Taxonomy to Text Analytics Data Harmony, Multi-Tes Extraction and Analytics – Linguamatics (Pharma), Temis, whole range of companies Business Intelligence – Clear Forest, Inxight Sentiment Analysis – Attensity, Lexalytics, Clarabridge Open Source – GATE Stand alone text analytics platforms – IBM, SAS, SAP, Smart Logic, Expert System, Basis, Open Text, Megaputer, Temis, Concept Searching Embedded in Content Management, Search – Autonomy, FAST, Endeca, Exalead, etc. – § § § 5

Interviews with Leading Vendors, Analysts: Current Trends § From Mundane to Advanced – reducing

Interviews with Leading Vendors, Analysts: Current Trends § From Mundane to Advanced – reducing manual labor to “Cognitive Computing” § Enterprise – Shift from Information to Business – cost cutting rather than productivity gains § Integration – data and text, text analytics and analytics – Social Media – explosion of wild text, combine with data – customer browsing behavior, web analytics § Big Data – more focus on extraction (where it began) but categorization adds depth and sophistication § Shift away from IT – compliance, legal, advertising, CRM § US market different than Europe/Asia – project oriented 6

Enterprise Text Analytics § Search is still #1 = 30 -50% of applications §

Enterprise Text Analytics § Search is still #1 = 30 -50% of applications § New Standard Search – facets (more and more metadata), autocategorization built on taxonomies, clustering – Issue – consistent metadata, multiple content sources § Trend = Text Analytics/Search as Semantic Infrastructure – Platform for Info Apps (Search-based applications) § Share. Point – Major focus of TA companies – fix problems with taxonomy/folksonomy – Hybrid workflow – Publish document -> TA analysis -> suggestions for categorization, entities, metadata -> present to author § External information = more automation, extraction – precision more important § Use of predictive facets, enhanced relevance (Fast) 7

Enterprise Text Analytics Adding Structure to Unstructured Content § Beyond Documents – categorization by

Enterprise Text Analytics Adding Structure to Unstructured Content § Beyond Documents – categorization by corpus, by page, sections or even sentence or phrase § Documents are not unstructured – variety of structures – Sections – Specific - “Abstract” to Function “Evidence” – Corpus – document types/purpose – Textual complexity, level of generality § Need to develop flexible categorization and taxonomy – tweets to 200 page PDF § Applications require sophisticated rules, not just categorization by similarity 8

9

9

Enterprise Text Analytics Document Type Rules § (START_2000, (AND, (OR, _/article: "[Abstract]", § §

Enterprise Text Analytics Document Type Rules § (START_2000, (AND, (OR, _/article: "[Abstract]", § § _/article: "[Methods]“), (OR, _/article: "clinical trial*", _/article: "humans", (NOT, (DIST_5, (OR, _/article: "approved", _/article: "safe", _/article: "use", _/article: "animals"), If the article has sections like Abstract or Methods AND has phrases around “clinical trials / Humans” and not words like “animals” within 5 words of “clinical trial” words – count it and add up a relevancy score Primary issue – major mentions, not every mention – Combination of noun phrase extraction and categorization – Results – virtually 100% 10

Enterprise Text Analytics Building on the Foundation: Applications § Focus on business value, cost

Enterprise Text Analytics Building on the Foundation: Applications § Focus on business value, cost cutting § Enhancing information access is means, not an end – – – Governance, Records Management, Doc duplication, Compliance Applications – Business Intelligence, CI, Behavior Prediction e. Discovery, litigation support Risk Management Productivity / Portals – spider and categorize, extract – KM communities & knowledge bases • New sources – field notes into expertise, knowledge base – capture real time, own language-concepts 11

Enterprise Text Analytics: Applications Pronoun Analysis: Fraud Detection; Enron Emails § Patterns of “Function”

Enterprise Text Analytics: Applications Pronoun Analysis: Fraud Detection; Enron Emails § Patterns of “Function” words reveal wide range of insights § Function words = pronouns, articles, prepositions, conjunctions, etc. Used at a high rate, short and hard to detect, very social, processed in the brain differently than content words Areas: sex, age, power-status, personality – individuals and groups Lying / Fraud detection: Documents with lies have: – Fewer, shorter words, fewer conjunctions, more positive emotion words – More use of “if, any, those, he, she, they, you”, less “I” Current research – 76% accuracy in some contexts – Italian – stylometry – linguistic hedges Text Analytics can improve accuracy and utilize new sources Data analytics (standard AML) can improve accuracy – § § § 12

Social Media: Next Generation Beyond Simple Sentiment § Beyond Good and Evil (positive and

Social Media: Next Generation Beyond Simple Sentiment § Beyond Good and Evil (positive and negative) – Degrees of intensity, complexity of emotions and documents § Importance of Context – around positive and negative words Rhetorical reversals – “I was expecting to love it” – Issues of sarcasm, (“Really Great Product”), slanguage – § Essential – need full categorization and concept extraction § New Taxonomies – Appraisal Groups – “not very good” Supports more subtle distinctions than positive or negative § Emotion taxonomies - Joy, Sadness, Fear, Anger, Surprise, Disgust – – New Complex – pride, shame, confusion, skepticism § New conceptual models, models of users, communities 13

Social Media: Next Generation Behavior Prediction – Telecom Customer Service § Problem – distinguish

Social Media: Next Generation Behavior Prediction – Telecom Customer Service § Problem – distinguish customers likely to cancel from mere threats § Basic Rule – (START_20, (AND, (DIST_7, "[cancel]", "[cancel-what-cust]"), – (NOT, (DIST_10, "[cancel]", (OR, "[one-line]", "[restore]", “[if]”))))) § Examples: customer called to say he will cancell his account if the does not stop receiving a call from the ad agency. – cci and is upset that he has the asl charge and wants it off or her is going to cancel his act – § More sophisticated analysis of text and context in text § Combine text analytics with Predictive Analytics and traditional behavior monitoring for new applications 14

Social Media: Next Generation Variety of New Applications § Crowd Sourcing Technical Support User

Social Media: Next Generation Variety of New Applications § Crowd Sourcing Technical Support User Forums – find problem area, nearby text for solution – Automatic or Human mediated – § Legal Review Significant trend – computer-assisted review (manual =too many) – TA- categorize and filter to smaller, more relevant set – Payoff is big – One firm with 1. 6 M docs – saved $2 M – § Financial Services Trend – using text analytics with predictive analytics – risk and fraud – Combine unstructured text (why) and transaction data (what) – Customer Relationship Management, Fraud Detection – Stock Market Prediction – Twitter, impact articles – 15

Text Analytics: New Directions Integration § Text and Data, Internal and External, Enterprise and

Text Analytics: New Directions Integration § Text and Data, Internal and External, Enterprise and Social § Focus - multiple approaches are needed and multiple ways to combine – Death to the Dichotomies – All of the Above § Massive parallelism or deeply integrated solution Example of Watson - fast filtering to get to best 100 answers, then deep analysis of 100 § Role of automatic / human – § CRM – struggle to connect to enterprise Have to learn to speak “enterprise” § Imply – Sentiment analysis focus for companies not enough § Enterprise and Social Media (Delve) – – Social Media analysis and news aggregation 16

Delve for the Web: The Front Page of Knowledge Management Users follow topics, people,

Delve for the Web: The Front Page of Knowledge Management Users follow topics, people, and companies selected from Delve taxonomies. Social media data from Twitter powers recommen dation algorithms.

Text Analytics: New Directions - Integration Thinking Fast and Slow – Daniel Kahneman §

Text Analytics: New Directions - Integration Thinking Fast and Slow – Daniel Kahneman § System 1 – fast and automatic – little conscious control § Represents categories as prototypes – stereotypes – – – Norms for immediate detection of anomalies – distinguish the surprising from the normal fast detection of simple differences, detect hostility in a voice, find best chess move (if a master) Priming / Anchoring – susceptible to systemic errors Biased to believe and confirm Focuses on existing evidence (ignores missing – WYSIATI) §. 18

Text Analytics: New Directions - Integration Thinking Fast and Slow § System 2 –

Text Analytics: New Directions - Integration Thinking Fast and Slow § System 2 – Complex, effortful judgments and calculations System 2 is the only one that can follow rules, compare objects on several attributes, and make deliberate choices – Understand complex sentences, validity of logical argument – Focus attention – can make people blind to all else – Invisible Gorilla – § Similar to traditional dichotomies – Tacit – Explicit, etc § Basic Design – System 1 is basic to most experiences, and System 2 takes over when things get difficult – conscious control § Text Analysis and Text Mining / Auto-Cat and TA Cat 19

Text Analytics: New Directions - Integration System 1 & 2 – and Text Analytics

Text Analytics: New Directions - Integration System 1 & 2 – and Text Analytics Approaches § “Automatic Categorization” – System 1 prototypes Limited value -- only works in simple environments – Shallow categories with large differences – Not open to conscious control § System 2 – categories – complex, minute differences, deep categories § Together: – Choose one or other for some contexts – Combine both – need to develop new kinds of categories and/or new ways to combine? – 20

Text Analytics: New Directions - Integration Text Mining and Text Analytics § Text Analytics

Text Analytics: New Directions - Integration Text Mining and Text Analytics § Text Analytics and Big Data enrich each other – Data tells you what people did, TA tells you why § Text Analytics – pre-processing for TM Discover additional structure in unstructured text – New variables for Predictive Analytics, Social Media Analytics – New dimensions – 90% of information, 50% using Twitter analysis – § Text Mining for TA– Semi-automated taxonomy development Apply data methods, predictive analytics to unstructured text – New Models – Watson ensemble methods, reasoning apps – § Extraction – smarter extraction – sections of documents, Boolean, advanced rules – drug names, adverse events – major mention 21

Text Analytics: New Directions - Integration – Text Analytics and CRM § Overall –

Text Analytics: New Directions - Integration – Text Analytics and CRM § Overall – growing demand for natural language processing, TA Identify when a customer is angry or at risk of closing an account – Growth of regulatory compliance requirements is driving – Used to understand why people call and whether they were satisfied with the quality of the experience, diagnose issues and address them – Combine with Web analytics – need an integrated system – § Contact Center Search – searching and analyzing customer data across § § multiple channels – Integration – Salesforce, Coveo, e. Gain, In. Quira Enterprise Feedback Management ––want to track satisfaction and loyalty – issue of unstructured content social media, multimedia channels Contact Center Infrastructure – Importance of Cloud based Services and Infrastructure – Need Semantic Infrastructure – Cisco – Packaged Contact Center Enterprise – § Web Support – virtual agents – deliver one answer to a customer’s question, not search results list – Missing – integrated knowledge management system 22

Future of Text Analytics Obstacles - Survey Results § What factors are holding back

Future of Text Analytics Obstacles - Survey Results § What factors are holding back adoption of TA? Lack of clarity about TA and business value - 47% – Lack of senior management buy-in - 8. 5% – § Need articulated strategic vision and immediate practical win § Issue – TA is strategic, US wants short term projects – Sneak Project in, then build infrastructure – difficulty of speaking enterprise § Integration Issue – who owns infrastructure? IT, Library, ? IT understands infrastructure, but not text – Need interdisciplinary collaboration – Stanford is offering English. Computer Science Degree – close, but really need a library-computer science degree – 23

Future of Text Analytics Primary Obstacle: Complexity § Usability of software is one element

Future of Text Analytics Primary Obstacle: Complexity § Usability of software is one element § More important is difficulty of conceptual-document models – Language is easy to learn , hard to understand model § Need to add more intelligence (semantic networks) and ways for the system to learn – social feedback § Customization – Text Analytics– heavily context dependent – Content, Questions, Taxonomy-Ontology – Level of specificity – Telecommunications – Specialized vocabularies, acronyms 24

New Directions in Text Analytics Conclusions § Text Analytics is growing out (20%) and

New Directions in Text Analytics Conclusions § Text Analytics is growing out (20%) and up – more mature applications and technique § Find the right balance of infrastructure and application focus § Essential theme – integration – text and data, enterprise and social § Big obstacles remain Strategic Vision of text analytics in the enterprise – Concrete and quick application to drive acceptance – § Future – Women, Fire, and Dangerous Things – Text Analytics and Cognitive Science = Metaphor Analysis, deep language understanding, common sense? 25

Questions? Tom Reamy tomr@kapsgroup. com KAPS Group http: //www. kapsgroup. com Upcoming: Text Analytics

Questions? Tom Reamy tomr@kapsgroup. com KAPS Group http: //www. kapsgroup. com Upcoming: Text Analytics World SF - 2015 Workshop on Text Analytics: Enterprise Search Summit – New York, May 12 -14 Taxonomy Boot Camp, ESS, KMWorld -DC, Nov 4 -7 Fall Announcement!