Text Analytics World Current Applications and Future Directions
- Slides: 26
Text Analytics World Current Applications and Future Directions of Text Analytics Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional Services http: //www. kapsgroup. com
Agenda § Introduction: Current State of Text Analytics – Survey / Discussion Themes Enterprise Text Analytics - Search – still fundamental – Shift from information to business Social Media – Next Generation – Text Analytics and CRM Integration – Text and Data, Enterprise and Social Future of Text Analytics – Roadblocks, Deep Vision – § § § Questions 2
Introduction: KAPS Group § Knowledge Architecture Professional Services – Network of Consultants § Applied Theory – Faceted taxonomies, complexity theory, natural categories, emotion taxonomies § Services: – Strategy – IM & KM - Text Analytics, Social Media, Integration – Taxonomy/Text Analytics development, consulting, customization – Text Analytics Quick Start – Audit, Evaluation, Pilot – Social Media: Text based applications – design & development § Partners – SAS, Smart Logic, Expert Systems, SAP, IBM, FAST, Concept Searching, Attensity, Clarabridge, Lexalytics § Projects – Portals, taxonomy, Text analytics – news, expertise location, information strategy, text analytics evaluation, Quick Start in Text A. § Clients: Genentech, Novartis, Northwestern Mutual Life, Financial Times, Hyatt, Home Depot, Harvard Business Library, British Parliament, Battelle, Amdocs, FDA, GAO, World Bank, etc. 3 § Presentations, Articles, White Papers – www. kapsgroup. com
Text Analytics World Current State of Text Analytics § History – academic research, focus on NLP § Inxight –out of Zerox Parc – Moved TA from academic and NLP to auto-categorization, entity extraction, and Search-Meta Data § Explosion of companies – many based on Inxight extraction with some analytical-visualization front ends – § § § Half from 2008 are gone - Lucky ones got bought Early applications – News aggregation and Enterprise Search – Second Wave = shift to sentiment analysis Enterprise search – 30 -50% of market ($1 Bil) Text Analytics is growing 20% a year, 10% of analytics Fragmented market – no clear leader 4
Text Analytics World Current State of Text Analytics: Vendor Space § Taxonomy Management – Schema. Logic, Pool Party § From Taxonomy to Text Analytics Data Harmony, Multi-Tes Extraction and Analytics – Linguamatics (Pharma), Temis, whole range of companies Business Intelligence – Clear Forest, Inxight Sentiment Analysis – Attensity, Lexalytics, Clarabridge Open Source – GATE Stand alone text analytics platforms – IBM, SAS, SAP, Smart Logic, Expert System, Basis, Open Text, Megaputer, Temis, Concept Searching Embedded in Content Management, Search – Autonomy, FAST, Endeca, Exalead, etc. – § § § 5
Interviews with Leading Vendors, Analysts: Current Trends § From Mundane to Advanced – reducing manual labor to “Cognitive Computing” § Enterprise – Shift from Information to Business – cost cutting rather than productivity gains § Integration – data and text, text analytics and analytics – Social Media – explosion of wild text, combine with data – customer browsing behavior, web analytics § Big Data – more focus on extraction (where it began) but categorization adds depth and sophistication § Shift away from IT – compliance, legal, advertising, CRM § US market different than Europe/Asia – project oriented 6
Enterprise Text Analytics § Search is still #1 = 30 -50% of applications § New Standard Search – facets (more and more metadata), autocategorization built on taxonomies, clustering – Issue – consistent metadata, multiple content sources § Trend = Text Analytics/Search as Semantic Infrastructure – Platform for Info Apps (Search-based applications) § Share. Point – Major focus of TA companies – fix problems with taxonomy/folksonomy – Hybrid workflow – Publish document -> TA analysis -> suggestions for categorization, entities, metadata -> present to author § External information = more automation, extraction – precision more important § Use of predictive facets, enhanced relevance (Fast) 7
Enterprise Text Analytics Adding Structure to Unstructured Content § Beyond Documents – categorization by corpus, by page, sections or even sentence or phrase § Documents are not unstructured – variety of structures – Sections – Specific - “Abstract” to Function “Evidence” – Corpus – document types/purpose – Textual complexity, level of generality § Need to develop flexible categorization and taxonomy – tweets to 200 page PDF § Applications require sophisticated rules, not just categorization by similarity 8
9
Enterprise Text Analytics Document Type Rules § (START_2000, (AND, (OR, _/article: "[Abstract]", § § _/article: "[Methods]“), (OR, _/article: "clinical trial*", _/article: "humans", (NOT, (DIST_5, (OR, _/article: "approved", _/article: "safe", _/article: "use", _/article: "animals"), If the article has sections like Abstract or Methods AND has phrases around “clinical trials / Humans” and not words like “animals” within 5 words of “clinical trial” words – count it and add up a relevancy score Primary issue – major mentions, not every mention – Combination of noun phrase extraction and categorization – Results – virtually 100% 10
Enterprise Text Analytics Building on the Foundation: Applications § Focus on business value, cost cutting § Enhancing information access is means, not an end – – – Governance, Records Management, Doc duplication, Compliance Applications – Business Intelligence, CI, Behavior Prediction e. Discovery, litigation support Risk Management Productivity / Portals – spider and categorize, extract – KM communities & knowledge bases • New sources – field notes into expertise, knowledge base – capture real time, own language-concepts 11
Enterprise Text Analytics: Applications Pronoun Analysis: Fraud Detection; Enron Emails § Patterns of “Function” words reveal wide range of insights § Function words = pronouns, articles, prepositions, conjunctions, etc. Used at a high rate, short and hard to detect, very social, processed in the brain differently than content words Areas: sex, age, power-status, personality – individuals and groups Lying / Fraud detection: Documents with lies have: – Fewer, shorter words, fewer conjunctions, more positive emotion words – More use of “if, any, those, he, she, they, you”, less “I” Current research – 76% accuracy in some contexts – Italian – stylometry – linguistic hedges Text Analytics can improve accuracy and utilize new sources Data analytics (standard AML) can improve accuracy – § § § 12
Social Media: Next Generation Beyond Simple Sentiment § Beyond Good and Evil (positive and negative) – Degrees of intensity, complexity of emotions and documents § Importance of Context – around positive and negative words Rhetorical reversals – “I was expecting to love it” – Issues of sarcasm, (“Really Great Product”), slanguage – § Essential – need full categorization and concept extraction § New Taxonomies – Appraisal Groups – “not very good” Supports more subtle distinctions than positive or negative § Emotion taxonomies - Joy, Sadness, Fear, Anger, Surprise, Disgust – – New Complex – pride, shame, confusion, skepticism § New conceptual models, models of users, communities 13
Social Media: Next Generation Behavior Prediction – Telecom Customer Service § Problem – distinguish customers likely to cancel from mere threats § Basic Rule – (START_20, (AND, (DIST_7, "[cancel]", "[cancel-what-cust]"), – (NOT, (DIST_10, "[cancel]", (OR, "[one-line]", "[restore]", “[if]”))))) § Examples: customer called to say he will cancell his account if the does not stop receiving a call from the ad agency. – cci and is upset that he has the asl charge and wants it off or her is going to cancel his act – § More sophisticated analysis of text and context in text § Combine text analytics with Predictive Analytics and traditional behavior monitoring for new applications 14
Social Media: Next Generation Variety of New Applications § Crowd Sourcing Technical Support User Forums – find problem area, nearby text for solution – Automatic or Human mediated – § Legal Review Significant trend – computer-assisted review (manual =too many) – TA- categorize and filter to smaller, more relevant set – Payoff is big – One firm with 1. 6 M docs – saved $2 M – § Financial Services Trend – using text analytics with predictive analytics – risk and fraud – Combine unstructured text (why) and transaction data (what) – Customer Relationship Management, Fraud Detection – Stock Market Prediction – Twitter, impact articles – 15
Text Analytics: New Directions Integration § Text and Data, Internal and External, Enterprise and Social § Focus - multiple approaches are needed and multiple ways to combine – Death to the Dichotomies – All of the Above § Massive parallelism or deeply integrated solution Example of Watson - fast filtering to get to best 100 answers, then deep analysis of 100 § Role of automatic / human – § CRM – struggle to connect to enterprise Have to learn to speak “enterprise” § Imply – Sentiment analysis focus for companies not enough § Enterprise and Social Media (Delve) – – Social Media analysis and news aggregation 16
Delve for the Web: The Front Page of Knowledge Management Users follow topics, people, and companies selected from Delve taxonomies. Social media data from Twitter powers recommen dation algorithms.
Text Analytics: New Directions - Integration Thinking Fast and Slow – Daniel Kahneman § System 1 – fast and automatic – little conscious control § Represents categories as prototypes – stereotypes – – – Norms for immediate detection of anomalies – distinguish the surprising from the normal fast detection of simple differences, detect hostility in a voice, find best chess move (if a master) Priming / Anchoring – susceptible to systemic errors Biased to believe and confirm Focuses on existing evidence (ignores missing – WYSIATI) §. 18
Text Analytics: New Directions - Integration Thinking Fast and Slow § System 2 – Complex, effortful judgments and calculations System 2 is the only one that can follow rules, compare objects on several attributes, and make deliberate choices – Understand complex sentences, validity of logical argument – Focus attention – can make people blind to all else – Invisible Gorilla – § Similar to traditional dichotomies – Tacit – Explicit, etc § Basic Design – System 1 is basic to most experiences, and System 2 takes over when things get difficult – conscious control § Text Analysis and Text Mining / Auto-Cat and TA Cat 19
Text Analytics: New Directions - Integration System 1 & 2 – and Text Analytics Approaches § “Automatic Categorization” – System 1 prototypes Limited value -- only works in simple environments – Shallow categories with large differences – Not open to conscious control § System 2 – categories – complex, minute differences, deep categories § Together: – Choose one or other for some contexts – Combine both – need to develop new kinds of categories and/or new ways to combine? – 20
Text Analytics: New Directions - Integration Text Mining and Text Analytics § Text Analytics and Big Data enrich each other – Data tells you what people did, TA tells you why § Text Analytics – pre-processing for TM Discover additional structure in unstructured text – New variables for Predictive Analytics, Social Media Analytics – New dimensions – 90% of information, 50% using Twitter analysis – § Text Mining for TA– Semi-automated taxonomy development Apply data methods, predictive analytics to unstructured text – New Models – Watson ensemble methods, reasoning apps – § Extraction – smarter extraction – sections of documents, Boolean, advanced rules – drug names, adverse events – major mention 21
Text Analytics: New Directions - Integration – Text Analytics and CRM § Overall – growing demand for natural language processing, TA Identify when a customer is angry or at risk of closing an account – Growth of regulatory compliance requirements is driving – Used to understand why people call and whether they were satisfied with the quality of the experience, diagnose issues and address them – Combine with Web analytics – need an integrated system – § Contact Center Search – searching and analyzing customer data across § § multiple channels – Integration – Salesforce, Coveo, e. Gain, In. Quira Enterprise Feedback Management ––want to track satisfaction and loyalty – issue of unstructured content social media, multimedia channels Contact Center Infrastructure – Importance of Cloud based Services and Infrastructure – Need Semantic Infrastructure – Cisco – Packaged Contact Center Enterprise – § Web Support – virtual agents – deliver one answer to a customer’s question, not search results list – Missing – integrated knowledge management system 22
Future of Text Analytics Obstacles - Survey Results § What factors are holding back adoption of TA? Lack of clarity about TA and business value - 47% – Lack of senior management buy-in - 8. 5% – § Need articulated strategic vision and immediate practical win § Issue – TA is strategic, US wants short term projects – Sneak Project in, then build infrastructure – difficulty of speaking enterprise § Integration Issue – who owns infrastructure? IT, Library, ? IT understands infrastructure, but not text – Need interdisciplinary collaboration – Stanford is offering English. Computer Science Degree – close, but really need a library-computer science degree – 23
Future of Text Analytics Primary Obstacle: Complexity § Usability of software is one element § More important is difficulty of conceptual-document models – Language is easy to learn , hard to understand model § Need to add more intelligence (semantic networks) and ways for the system to learn – social feedback § Customization – Text Analytics– heavily context dependent – Content, Questions, Taxonomy-Ontology – Level of specificity – Telecommunications – Specialized vocabularies, acronyms 24
New Directions in Text Analytics Conclusions § Text Analytics is growing out (20%) and up – more mature applications and technique § Find the right balance of infrastructure and application focus § Essential theme – integration – text and data, enterprise and social § Big obstacles remain Strategic Vision of text analytics in the enterprise – Concrete and quick application to drive acceptance – § Future – Women, Fire, and Dangerous Things – Text Analytics and Cognitive Science = Metaphor Analysis, deep language understanding, common sense? 25
Questions? Tom Reamy tomr@kapsgroup. com KAPS Group http: //www. kapsgroup. com Upcoming: Text Analytics World SF - 2015 Workshop on Text Analytics: Enterprise Search Summit – New York, May 12 -14 Taxonomy Boot Camp, ESS, KMWorld -DC, Nov 4 -7 Fall Announcement!
- Making connections
- Text analytics and text mining
- Text analytics and text mining
- Future perfect or future continuous exercises
- Future perfect simple vs future perfect continuous
- Text analytics world
- "amplitude" analytics or "product analytics"
- Examples of current media and information technology
- Line currents
- Difference between phase voltage and line voltage
- Energy band diagram of pn junction diode
- Lesson 4 three-phase motors
- Drift current
- Drift current and diffusion current
- Balanced delta delta connection
- Slideplayer
- Drift current density unit
- Future directions of erp
- Current and future issues in corrections
- High side current mirror
- Wziu hub
- The constant-current region of a fet lies between
- Why must the electrode holder be correctly sized?
- Hazard based safety engineering
- Kcl mesh analysis
- Future continuous future perfect exercises
- Tenses summary