A Overview of Recent Trends in NLP ELMo

  • Slides: 17
Download presentation
A Overview of Recent Trends in NLP ELMo 、ULMFi. T、GPT、Bert、MT-DNN、GPT-2、ERNIE

A Overview of Recent Trends in NLP ELMo 、ULMFi. T、GPT、Bert、MT-DNN、GPT-2、ERNIE

Deep contextualized word representations(ELMo) What is a good word embedding? l Complex characteristics of

Deep contextualized word representations(ELMo) What is a good word embedding? l Complex characteristics of word use , e. g. syntax and semantics l Used vary across linguistic contexts The problems of current method 游戏 polysemy Play 演奏 比赛

Embeddings from Language Models(ELMo) Key Point l Trained with a coupled language model objective

Embeddings from Language Models(ELMo) Key Point l Trained with a coupled language model objective on a large text corpus. l Contextualized context-dependent context-independent Token representation Softmax layer

Feature extracted by deep neural network More lower layers , more basic feature

Feature extracted by deep neural network More lower layers , more basic feature

The highlight of ELMo Task specific combination semantic features syntactic features word features j-th

The highlight of ELMo Task specific combination semantic features syntactic features word features j-th layer Using ELMo softmax-normalized weights scale the entire ELMo vector

ELMo Evaluation Nearest neighbors to “play” using Glo. Ve and ELMo

ELMo Evaluation Nearest neighbors to “play” using Glo. Ve and ELMo

Universal Language Model Fine-tuning for Text Classification(ULMFi. T) Pre-trained language representations Difficulty l Feature-based

Universal Language Model Fine-tuning for Text Classification(ULMFi. T) Pre-trained language representations Difficulty l Feature-based Pre-Training: ELMo l Fine-tuning: ULMFi. T、Bert…. retain previous knowledge avoid catastrophic forgetting during finetuning Step l General-domain LM pretraining l Target task LM fine-tuning l Target task classifier fine-tuning

ULMFi. T Target task LM fine-tuning and Classofier fine-tuning Discriminative fine-tuning Spatial dimensions Slanted

ULMFi. T Target task LM fine-tuning and Classofier fine-tuning Discriminative fine-tuning Spatial dimensions Slanted triangular learning rates Avoid catastrophic forgetting Time dimensions Gradual unfreezing First unfreeze the last layer and fine-tune all unfrozen layers for one epoch, then unfreeze the next lower frozen layer and repeat.

ULMFi. T Evaluation

ULMFi. T Evaluation

Improving Language Understanding by Generative Pre-Training(GPT) Key Point Using Transformer Step l Unsupervised pre-training

Improving Language Understanding by Generative Pre-Training(GPT) Key Point Using Transformer Step l Unsupervised pre-training l Supervised fine-tuning LM as an auxiliary objective

GPT Evaluation 9 out of the 12 datasets achieve SOTA

GPT Evaluation 9 out of the 12 datasets achieve SOTA

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Key Point Bidirectional Transformers Larger

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Key Point Bidirectional Transformers Larger corpus l Unsupervised pre-training Masked LM Next Sentence Prediction

Multi-Task Deep Neural Networks for Natural Language Understanding(MT-DNN) Key Point Multi-Task Learning l MTL

Multi-Task Deep Neural Networks for Natural Language Understanding(MT-DNN) Key Point Multi-Task Learning l MTL provides an effective way of leveraging supervised data from many related tasks. l multi-task learning profits from a regularization effect via alleviating overfitting to a specific task Single-Sentence Classification Text Similarity Pairwise Text Classification Relevance Ranking

Language Models are Unsupervised Multitask Learners(GPT 2. 0) Key Point l Larger corpus 8

Language Models are Unsupervised Multitask Learners(GPT 2. 0) Key Point l Larger corpus 8 million l Deeper network GPT 1. 0 BERT GPT 2. 0 l Bidirectional Transformers Focus on Zero-shot task

Conclusion ELMo 、ULMFi. T、GPT、Bert、MT-DNN、GPT-2、ERNIE l l l Universal Language Model Pre-training transfer learning Multi-

Conclusion ELMo 、ULMFi. T、GPT、Bert、MT-DNN、GPT-2、ERNIE l l l Universal Language Model Pre-training transfer learning Multi- task Money