Let della parola Giuseppe Attardi Dipartimento di Informatica

  • Slides: 13
Download presentation
L’età della parola Giuseppe Attardi Dipartimento di Informatica Università di Pisa ESA So. Big.

L’età della parola Giuseppe Attardi Dipartimento di Informatica Università di Pisa ESA So. Big. Data Pisa, 24 febbraio 2015

Natural Language Learning l Children learn to speak naturally, by talking with others l

Natural Language Learning l Children learn to speak naturally, by talking with others l Teach computers to learn language in a similarly natural way

Statistical Machine Learning l Training on large document collections l Requires ability to process

Statistical Machine Learning l Training on large document collections l Requires ability to process Big Data § If we used same algorithms 10 years ago they would still be running l The Unreasonable Effectiveness of Big Data

Example: Machine Translation Arabic to English, five-gram language models, of varying size

Example: Machine Translation Arabic to English, five-gram language models, of varying size

Deep Learning Breakthrough: 2006 … Output layer Prediction of target Hidden layers Learn more

Deep Learning Breakthrough: 2006 … Output layer Prediction of target Hidden layers Learn more abstract representations Input layer Raw input … … …

Lots of Unlabeled Data l Language Model § Corpus: 2 B words § Dictionary:

Lots of Unlabeled Data l Language Model § Corpus: 2 B words § Dictionary: 130, 000 most frequent words § 4 weeks of training l Parallel + CUDA algorithm § 2 hours

Word Embeddings neighboring words are semantically related

Word Embeddings neighboring words are semantically related

A Unified Deep Learning Architecture for NLP l l l NER (Named Entity Recognition)

A Unified Deep Learning Architecture for NLP l l l NER (Named Entity Recognition) POS tagging Chunking Parsing SRL (Semantic Role Labeling) Sentiment Analysis

Deep Text Analysis l l l l Parsing Word Sense Disambiguation Anafora Resolution Information

Deep Text Analysis l l l l Parsing Word Sense Disambiguation Anafora Resolution Information Extraction Sentiment Analysis Text Entailment Question Answering Biomedical Text Analysis

QA on Alzheimer Disease ROOT OBJ SUBJ APPO OBJ the γ-secretase inhibitor Semacestat failed

QA on Alzheimer Disease ROOT OBJ SUBJ APPO OBJ the γ-secretase inhibitor Semacestat failed to slow cognitive decline protein drug disorder Snow. Med: C 0236848 substance QA on Alzheimer Competition

Correlation Simptoms. Diseases

Correlation Simptoms. Diseases

Big data, Big Brain l Google Distr. Belief § Cluster capable of simulating 100

Big data, Big Brain l Google Distr. Belief § Cluster capable of simulating 100 billion connections § Used to learn unsupervised image classification § Used to produce tiny ASR model Similar basic capability for processing image, audio and language l European FET Brain project l