Text Analytics And Text Mining Best of Text
Text Analytics And Text Mining Best of Text and Data Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services http: //www. kapsgroup. com
Agenda § Text Analytics Capabilities § Text Analytics Applications § Text Mining and Text Analytics – Data and Unstructured Content § Case Study – Text Mining for Taxonomy Development § Conclusion 2
KAPS Group: General § § § Knowledge Architecture Professional Services Virtual Company: Network of consultants – 8 -10 Partners – SAS, Smart Logic, Microsoft-FAST, Concept Searching, etc. Consulting, Strategy, Knowledge architecture audit Services: – Text Analytics evaluation, development, consulting, customization – Knowledge Representation – taxonomy, ontology, Prototype – Metadata standards and implementation – Knowledge Management: Collaboration, Expertise, e-learning – Applied Theory – Faceted taxonomies, complexity theory, natural categories 3
Introduction to Text Analytics Features § Noun Phrase Extraction Catalogs with variants, rule based dynamic – Multiple types, custom classes – entities, concepts, events – Feeds facets – § Summarization – Customizable rules, map to different content § Fact Extraction Relationships of entities – people-organizations-activities – Ontologies – triples, RDF, etc. – § Sentiment Analysis – Statistical, rules – full categorization set of operators 4
Introduction to Text Analytics Features § Auto-categorization Training sets – Bayesian, Vector space – Terms – literal strings, stemming, dictionary of related terms – Rules – simple – position in text (Title, body, url) – Semantic Network – Predefined relationships, sets of rules – Boolean– Full search syntax – AND, OR, NOT – Advanced – NEAR (#), PARAGRAPH, SENTENCE This is the most difficult to develop Build on a Taxonomy Combine with Extraction, Sentiment Foundation for best text analytics & combination – § § 5
6
7
8
9
10
11
Varieties of Taxonomy/ Text Analytics Software § Taxonomy Management – Synaptica, Schema. Logic § Full Platform – SAS-Teragram, SAP-Inxight, Smart Logic, Data Harmony, Concept Searching, Expert System, IBM, GATE § Content Management – embedded § Embedded – Search – FAST, Autonomy, Endeca, Exalead, etc. § Specialty Sentiment Analysis , VOC – Lexalytics, Attensity / Reports – Ontology – extraction, plus ontology – 12
Text Analytics Applications Platform for Multiple Applications § § § § Content Aggregation, Duplicate Documents – save millions! Business intelligence, Customer Intelligence Social Media - sentiment analysis, Voice of the Customer Social – Hybrid folksonomy / taxonomy / auto-metadata Social – expertise, categorize tweets and blogs, reputation Ontology – travel assistant, semantic web, etc. e. Discovery, Reputation management, Customer Experience Expertise Location, Crowd sourcing Technical support 13
Text Analytics Applications: Enterprise Search - Elements § Text Analytics can “solve” enterprise search § Multiple Knowledge Structures Facet – orthogonal dimension of metadata – Taxonomy - Subject matter / aboutness – § Software - Search, ECM, auto-categorization, entity extraction, Text Analytics and Text Mining § People – tagging, evaluating tags, fine tune rules and taxonomy § Rich Search Results – context and conversation § Platform for search based applications 14
15
16
Text Analytics and Text Mining Data and Unstructured Content § 80% of content is unstructured – adding to semantic web is major § Text Analytics – content into data – Big Data meets Big Content § Real integration of text and ontology Beyond “has. Description” – Improve accuracy of extracted entities, facts – disambiguation • Pipeline – oil & gas OR research / Ford – Add Concepts, not just “Things” – 68% want this Semantic Web + Text Analytics = real world value Linked Data + Text Analytics – best of both worlds Build superior foundation elements – taxonomies, categorization – § § § 17
Text Analytics and Text Mining and Data Mining Vaccine Adverse Reaction § Combine with Data Mining § New sources of information § News stories, medical records § Blogs, social § Find new connections, sources of knowledge § Vaccine Adverse Effects – disease, symptoms, variables § Unstructured text into a data source § Some preliminary analysis, content structure § Find unknown adverse effects and prevalence § Drug Discovery + search / research – 5 year story 18
Text Analytics Applications Example – Vaccine Adverse Effects 19
Text Analytics Applications Example – Vaccine Adverse Effects 20
Text Analytics Applications Example – Vaccine Adverse Effects 21
Text Analytics and Text Mining Case Study – Taxonomy Development § § § § Problem – 200, 000 new uncategorized documents Old taxonomy –need one that reflects change in corpus Text mining, entity extraction, categorization Bottom Up- terms in documents – frequency, date, Clustering – suggested categories Clustering – chunking for editors Time savings – only feasible way to scan documents Quality – important terms, co-occurring terms 22
Text Analytics and Text Mining Case Study – Taxonomy Development § Text into Data: Article, Abstract, Title, Subtitle – fields & source of terms § Add Data: Pub. Date, journal. Title, Taxonomy Node § Terms – Map to frequency, date ranges, Taxonomy Node New Terms, Trends Relevance – frequency, Abstract, Title, human judgment Entity Extraction – Authors, Organizations, Products, Categorization – build on clusters & taxonomy Combination – reports, visualizations, interactive explorations – § § 23
Case Study – Taxonomy Development 24
25
26
Case Study – Taxonomy Development 27
Case Study – Taxonomy Development 28
Conclusion § Text Analytics impact is huge – solve information overload § Enterprise Search and Search Based Applications: Save millions § § and enhance productivity Combination of Text Analytics & Text Mining – unlimited range of applications Mutual Enrichment – more data, add structure to unstructured Add Ontology = Richer Text Analytics – smarter, more useful Text Analytics + Text Mining + Semantic Web – Move from theory to new practical applications § The best is yet to come! 29
Questions? Tom Reamy tomr@kapsgroup. com KAPS Group Knowledge Architecture Professional Services http: //www. kapsgroup. com
- Slides: 30