Text Analytics MiniWorkshop Seminar Prof Dr Chris Biemann

Text Analytics Mini-Workshop Seminar Prof. Dr. Chris Biemann Introduction to the First Darmstadt Workshop on Text Analytics: Unsupervised Natural Language Processing Darmstadt, February 7, 2012 WS 2011/12 | Computer Science Department | UKP Lab - Prof. Dr. Chris Biemann |

Text Analytics: Unsupervised NLP Cooccurrence Clustering Methods Language Statistics Topic Models Latent Semantic Analysis U-DOP Parsing Unsupervised POS Tagging Word Sense Induction Unsupervised Morphology Structure Discovery Text Mining Language Models 2 § find structure in large amounts of text data § use this to improve NLP applications

How does ‘normal’ NLP work? § NLP-Pipeline: several preprocessing steps motivated by linguistic layers § Single steps realization: § rules § use of lexical resources § machine learning on annotated data Text Document POS, Stemming, Chunking Named Entity Recognition Syntactic Analysis Semantic Analysis Acquisition Bottleneck: creation of rules, resources or annotated data is a huge effort Processed Text Document WS 2011/12 | Computer Science Department | UKP Lab - Prof. Dr. Chris Biemann | 3 TASK

Structure Discovery: unsupervised NLP § Preprocessing steps are induced § these can build on each other § task-based assessment of the utility of steps § no need to create training data for intermediate steps § still need to define the task Text Document unsu. POS, unsu. Morph, unsu. Multiword unsupervised entity finding unsupervised Syntactic Analysis unsupervised Semantic Analysis Processed Text Document Success: § replacement of standard steps by induced steps at equal performance § performance improvements WS 2011/12 | Computer Science Department | UKP Lab - Prof. Dr. Chris Biemann | 4 TASK

Workshop Program Session 1: S 2/02 C 120 § 15: 30 - 15: 40 Chris Biemann: Workshop Introduction § 15: 40 - 16: 00 Jens Haase: Punctuation Correction with N-grams § 16: 00 - 16: 20 Irina Alles: Punctuation Correction using Web Counts § 16: 20 - 16: 40 Richard Steuer: Distributional Similarity for Text Similarity § 16: 40 - 17: 00 Johannes Schwandke: Bestimmung von semantisch ähnlichen Worten mit Hilfe von Kookkurrenzen und Wortstämmen 17: 00 - 17: 20 Coffee Break in A 126 WS 2011/12 | Computer Science Department | UKP Lab - Prof. Dr. Chris Biemann | 5

Workshop Program II Session 2: S 2/02 A 126 § 17: 20 - 17: 40 Tobias Krönke: Co-occurrence relations for Text Segmentation § 17: 40 - 18: 00 Leander Baumann: Text. Tiling with Coocurrence Data § 18: 00 - 18: 20 Konstantin Tennhard: Clustering of Movie Subtitles § 18: 20 - 18: 40 David Kaufmann: Distant Supervision with Wikipedia Infoboxes § 18: 40 - 19: 00 Marcel Ackermann: Distant Supervised Relation Extraction with Wikipedia and Freebase Talks: 15 min. Questions: 5 min. Questions from authors preferred. WS 2011/12 | Computer Science Department | UKP Lab - Prof. Dr. Chris Biemann | 6