Text Analytics Forum Program Chair Tom Reamy Chief

  • Slides: 29
Download presentation
Text Analytics Forum Program Chair: Tom Reamy Chief Knowledge Architect KAPS Group http: //www.

Text Analytics Forum Program Chair: Tom Reamy Chief Knowledge Architect KAPS Group http: //www. kapsgroup. com Author: Deep Text

Agenda § Introduction – Welcome – Overview of Conference § Text Analytics Introduction What

Agenda § Introduction – Welcome – Overview of Conference § Text Analytics Introduction What is it? – What is it good for? – Results of TAF Survey – § Key Ideas – Present and Future § Questions 2

Text Analytics Forum (TAF) Introduction Conference Highlights and Themes § Newest member of Info

Text Analytics Forum (TAF) Introduction Conference Highlights and Themes § Newest member of Info Today family of conferences First year of many? KMWorld – TA is a means of enriching KM – Enriched content, expertise, collaboration TBC – TA Minds the Gap between taxonomy and content – New knowledge organizations, cognitive-based ESD – TA is best means of improving search – Faceted search, semi-automated subject tagging Share. Point – all major TA vendors integrate with it – Hybrid model – software characterizing document, sent to author/editor for human check – § § 3

Text Analytics Forum (TAF) Introduction Conference Highlights and Themes § Overview of field of

Text Analytics Forum (TAF) Introduction Conference Highlights and Themes § Overview of field of text analytics – General and current market by Seth Grimes § Two tracks – technical / business & applications – Could be development and applications § Technical AI and TA, Cognitive computing, graph databases, Text and Data – ML vs. Rules, taxonomy, Auto-categorization – § Business / Applications Search & TA, Fake News & Ads, TA and Taxonomy – Case Studies, New Applications, Issues in Applications – § Ask the Experts Panel Some questions about the field of TA – We want your questions – 4

Text Analytics Forum (TAF) Introduction Deep Text: The Book – Who Am I? §

Text Analytics Forum (TAF) Introduction Deep Text: The Book – Who Am I? § § § Professional student / independent consultant – all but 6 years History of Ideas to Programmer – AI (Only 2 years away) Games – Galactic Gladiators/Adventures – still available KAPS Group – 13 years, Network of consultants (“hiring”) – Taxonomy to text analytics – Consulting, development – platform and applications – Strategy, Smart Start, Search, Smart Social Media – TA Training (1 day to 1 month), TA Audit – Partners – Synaptica, SAS, IBM, Expert System, Smartlogic, etc. – Clients: Genentech, Novartis, Northwestern Mutual Life, Financial Times, Hyatt, Home Depot, Harvard, British Parliament, Battelle, Amdocs, FDA, GAO, World Bank, Dept. of Transportation, etc. Presentations, Articles, White Papers – www. kapsgroup. com 5

A treasure trove of technical detail, likely to become a definitive source on text

A treasure trove of technical detail, likely to become a definitive source on text analytics – Kirkus Reviews 6

Text Analytics Forum (TAF) Introduction What is Text Analytics? § Text analytics is the

Text Analytics Forum (TAF) Introduction What is Text Analytics? § Text analytics is the use of software and knowledge models to § § analyze/utilize structures in poly-structured text. Text Mining – NLP, statistical, predictive, machine learning • Different skills, mind set, Math & data not language Annotation/Extraction – entities and facts – known and unknown, concepts, events - catalogs with variants, rule based Sentiment Analysis • Entities and sentiment words – statistics & rules Summarization • Dynamic – based on a search query term • Document – based on primary topics, position in document 7

Text Analytics Forum (TAF) Introduction What is Text Analytics? § § § Auto-categorization =

Text Analytics Forum (TAF) Introduction What is Text Analytics? § § § Auto-categorization = the brains of the outfit Training sets – Bayesian, Vector space Terms – literal strings, stemming, dictionary of related terms Boolean– Full search syntax – AND, OR, NOT Advanced – DIST(#), ORDDIST#, PARAGRAPH, SENTENCE 8

Text Analytics Forum (TAF) Introduction What is Text Analytics Good For? § Just about

Text Analytics Forum (TAF) Introduction What is Text Analytics Good For? § Just about anything textual you can think of § Enterprise: Search, BI, CI, Financial Services, e. Discovery, etc. – Fraud – Function word patterns – Adding text (depth and intelligence) to all data-based applications – Whole new applications – customers likely to cancel, new? § Social: – Social Media analysis – adding text to data – Sentiment analysis – beyond positive and negative – Fake news – multiple module model – 9

Text Analytics Forum (TAF) Introduction Future Directions: Survey Results – 2017 § Important Areas:

Text Analytics Forum (TAF) Introduction Future Directions: Survey Results – 2017 § Important Areas: – – – – – Business Intelligence – 87% Decision Support - 83% Financial Intelligence – 81% KM-Productivity – 80% Search – Search Apps – 78% Security – 77% Compliance – 76% Voice of Customer – 73% Social Media Analysis – 69% 10

Text Analytics Forum (TAF) Introduction Future Directions: Survey Results – 2017 § Who is

Text Analytics Forum (TAF) Introduction Future Directions: Survey Results – 2017 § Who is driving TA? R&D – 25% – IT – 22% – Rest are minor – § Factors slowing adoption of TA Lack of Knowledge/value – 43% – Financial – 18% – Lack of in-house expertise – 11% – § What new capabilities? – Deep Learning, ML, AI – 23% 11

Text Analytics Forum (TAF) Introduction Future Directions: Survey Results – 2017 § What do

Text Analytics Forum (TAF) Introduction Future Directions: Survey Results – 2017 § What do you like about TA software? Ease of Use – Configurability – Accuracy, quality of results – § What don’t you like? Difficult – No one solution – domain specific – § Most difficult aspect of TA initiatives? Data Preparation – Language complexity – Understanding business needs, domain resources – 12

Text Analytics Forum (TAF) Introduction § Key Ideas / Trends in Text Analytics 13

Text Analytics Forum (TAF) Introduction § Key Ideas / Trends in Text Analytics 13

Text Analytics Forum (TAF) Introduction Key Ideas in Text Analytics § AI / Deep

Text Analytics Forum (TAF) Introduction Key Ideas in Text Analytics § AI / Deep Learning to the Rescue? – Humans obsolete or empowered? § Machine Learning vs. Rules-based § Poly-structured text – Content Types and Sections § New Knowledge Structures – Cognitive & Social 14

Text Analytics Forum (TAF) Introduction Key Ideas in Text Analytics: Deep Learning § §

Text Analytics Forum (TAF) Introduction Key Ideas in Text Analytics: Deep Learning § § Neural Networks – from 1980’s New = size and speed Larger networks = can learn better and faster Multiple networks = more “intelligence” – networks output fed to other networks § Strongest in areas like image recognition, physical patterns § Weakest – concepts, subjects, deep language, metaphors, etc. 15

Text Analytics Forum (TAF) Introduction Deep Text vs. Deep Learning § Deep Learning is

Text Analytics Forum (TAF) Introduction Deep Text vs. Deep Learning § Deep Learning is a Dead End - accuracy – 60 -70% § Black Box – don’t know how to improve except indirect § § manipulation of input – Watson – “We don’t know how or why it works” – Susceptible to bias – hard to fix Domain Specific, data not deep understanding No common sense (things fall, don’t wink in and out of existence – No strategy to get there (faster not enough) Major – loss of quality – who is training who? – Project personality and intelligence – on everything! Extra Benefits of a Deep Text Approach – Multiple Info. Apps 16

Text Analytics Forum (TAF) Introduction Key Ideas in Text Analytics: Automatic Taxonomy § Most

Text Analytics Forum (TAF) Introduction Key Ideas in Text Analytics: Automatic Taxonomy § Most Text Analytics vendors offered – very poor results, dropped § New techniques – getting better but don’t give up your taxonomist day job § Automatic – but not a taxonomy – cluster of co-occurring terms – Suggest terms and relationships § Text mining on steroids § “Automatic” – huge human effort to design approach, mathematics, select content, seed taxonomies, keyword selection, data prep – then voila! 17

AI and Taxonomy AI: Past and Present 18

AI and Taxonomy AI: Past and Present 18

Text Analytics Forum (TAF) Introduction Key Ideas in Text Analytics – AI / Deep

Text Analytics Forum (TAF) Introduction Key Ideas in Text Analytics – AI / Deep Learning to the Rescue? – Machine Learning vs. Rules-based • Right kind of rules – general structure • Learning with rules, ML with structure – Poly-structured text – Content Types and Sections – New Knowledge Structures – Cognitive & Social 19

Text Analytics Forum (TAF) Introduction Key Ideas in Text Analytics § Machine Learning –

Text Analytics Forum (TAF) Introduction Key Ideas in Text Analytics § Machine Learning – Deep Learning but less Limited granularity – high level categories, very orthogonal – Faster to get started and get to 60% - then the wall – § ML – scale – can do millions of texts – But – both require upfront development – and once done, both can handle the same amount of content § Do rules take more effort to develop? – Some studies show it is less: “A rule-based system recoups its value in one month, compared with almost five years under the statisticsbased approach” 20

Text Analytics Forum (TAF) Introduction Key Ideas in Text Analytics § Rules-Based Rule-based system

Text Analytics Forum (TAF) Introduction Key Ideas in Text Analytics § Rules-Based Rule-based system reported 92 percent accuracy and a fourfold increase in productivity. – Less up front cost, and less time spent refining – § Statistical Approach – Maximum accuracy achieved, 72 percent; productivity doubled. § Why do IT departments favor ML? – ML uses programmers and statisticians – more of them available than librarians, taxonomists, metadata, puzzle people § Future = Combine machine learning and rules – Application Level to categorization language level 21

Text Analytics Forum (TAF) Introduction Key Ideas in Text Analytics – AI / Deep

Text Analytics Forum (TAF) Introduction Key Ideas in Text Analytics – AI / Deep Learning to the Rescue? – Machine Learning vs. Rules-based – Poly-structured text – Content Types and Sections • Deep Text a foundation for multiple applications • Using sections for better auto rules – New Knowledge Structures – Cognitive & Social 22

Text Analytics Forum (TAF) Introduction Adding Structure to Unstructured Content § Content Type –

Text Analytics Forum (TAF) Introduction Adding Structure to Unstructured Content § Content Type – defined by sections – Blogs, Announcements, Articles, Press Releases, News, Case Reports, Correspondence § Sections – – – Metadata and text indicators – rules to find Document Level: Title-Keywords, Abstract, summary, etc. Special sections – Methods, Objectives, Results, etc. Data patterns – dates, addresses – need context rules Weights – ignore all but section text to sophisticated weighting § Clusters and machine learning – at section level, not document – Clusters as sections, clusters within sections 23

24

24

Text Analytics Forum (TAF) Introduction Key Ideas in Text Analytics – AI / Deep

Text Analytics Forum (TAF) Introduction Key Ideas in Text Analytics – AI / Deep Learning to the Rescue? – Machine Learning vs. Rules-based – Poly-structured text – Content Types and Sections – New Knowledge Structures – Cognitive & Social • Relational Frame Theory • Deep Psychology/marketing 25

Text Analytics Forum (TAF) Introduction New Knowledge Structures § Multiple types of Knowledge Organization

Text Analytics Forum (TAF) Introduction New Knowledge Structures § Multiple types of Knowledge Organization Taxonomy – concepts, hierarchical – Ontology – any type of relationship, things and concepts – Knowledge Graphs – triples, unlimited, no overall structure, best for facts – § K Graphs and hierarchical – best way to merge? Modules, facets – Hierarchical network models – § New types – cognitive science - RFT, other? Brain is more than a network – universal language detector – Child at 6 -9 months – tell the difference between words – forwards and backwards – in any language – 26

Text Analytics Forum (TAF) Introduction AI and Taxonomy § Relational Frame Theory - RFT

Text Analytics Forum (TAF) Introduction AI and Taxonomy § Relational Frame Theory - RFT – – – – Coordination – (similarity) dog is same as hound – types of similarity? Taxonomy of similarities? Distinction – (difference) – white dog different than a black dog Opposition – a black dog versus a while cat Comparison – this dog is bigger than that dog Spatial – this dog is on the left Temporal – I fed the dog before the cat Hierarchical – a dog is a sort of mammal Causal – a dog bit causes me to cry 27

Text Analytics Forum (TAF) Introduction Conclusions § AI-Deep Learning – still “Two years away”

Text Analytics Forum (TAF) Introduction Conclusions § AI-Deep Learning – still “Two years away” § Deep Text Linguistic and cognitive depth – human-like learning – Integration of multiple techniques and modules – Infrastructure – Move fast with a stable infrastructure § Enjoy the conference! § Stay tuned – Next Year – TAF season II – better than ever! – New generation of text analytics software? – 28

Questions? Tom Reamy tomr@kapsgroup. com KAPS Group Knowledge Architecture Professional Services http: //www. kapsgroup.

Questions? Tom Reamy tomr@kapsgroup. com KAPS Group Knowledge Architecture Professional Services http: //www. kapsgroup. com