Finegrained Language Identification with Multilingual Caps Net Model
- Slides: 36
Fine-grained Language Identification with Multilingual Caps. Net Model Mudit Verma 1 & Arun Balaji Buduru 2 1. Arizona State University, USA 2. IIIT-Delhi, India
Need Emergency Call Routing Services Intelligent Voice Assistants Conversational AI Speech is the easiest form of communication for humans.
Trends • • Language Embeddings (i-vector) Use Hand Crafted Features ( & Phoneme Detection) Mel Frequency Cepstral Coefficients Spectrograms P. Verma and P. K. Das, “i-vectors in speech processing applications: a survey, ” International Journal of Speech Technology, vol. 18, no. 4, pp. 529– 546, Dec 2015. [Online]. Available: https: //doi. org/10. 1007/s 10772 -015 -9295 -3
Trends View it as a classification Problem • SVM / GMM / HMM • Logistic Regression • Fully connected Neural Networks • BLSTM • CNN (based on VGG)
Issues • Manual Feature Extraction is hard • Data Requirements • Robustness to Noise
Fine-Grained LID Problem Characteristics : 1. Short Spoken Audio Snippets (5 s-10 s) 2. Multiple Languages 3. Noise 4. Exiguous Train Data 5. Trivial Data collection 6. Non-Class Identification 7. Multilingual
Dataset - Languages Arabic - AR Bengali - BE Chinese(Mand. ) - CH English - EN Hindi - HI Turkish Spanish Japanese Punjabi Portuguese *used for Language Identification **used for Non-Class Tests
Dataset - Languages Arabic - AR Bengali - BE Chinese(Mand. ) - CH English - EN Hindi - HI Turkish Spanish Japanese Punjabi Portuguese *used for Language Identification **used for Non-Class Tests • Deal with Indian Languages with more popular languages used for LID Task • Diverse set of languages • Exiguous Data requirements help with regional LID
Dataset - Collection Type : Audio recordings of local and global news, interviews, speeches etc. Source : You. Tube Data Size : 100 hrs / language for LID Task ( -> 500 hrs total) 30 hrs / language for Non-Class Task (-> 150 hrs total) Train / Test Size : 70 -30 for LID 20 -10 for Non-Class
Dataset - Characteristics Trivial and Easy Data Collection Various Types of Noise : 1. Heard/Not-Understandable (spoken language is not understood) Background noise of cheers, slogans 2. Heard/Understandable (multiple spoken languages) Interviews/News reporting in multiple languages 3. Unheard (noise but not spoken language) Chimes/Mic Noise
Dataset - Processing • • . wav format Spectrogram Representation Discretize using Hann window & 129 frequency bins 8 -bit grayscale
Work • Handle Problem in Image Domain • Use Capsule Networks (Caps. Net) for classification • Compare with variants of CNN + Bi-GRU CNN + Bi-LSTM CNN + Bi-GRU + Attention • Test deeper variant of Caps. Net • Verify Non-Class Detection (Out of Distribution Samples)
Image Domain • Use Spectrogram • Mel-Frequency Coeff. Cepstrum does not help much.
Caps. Nets - Theory • CNNs are great but they have a problem. They have : • Positional Invariance (Thanks to Pooling layers) Tolerant to View. Point Invariance S. Sabour, N. Frosst, and G. E. Hinton, “Dynamic routing between capsules, ” in Advances in Neural Information Pro- cessing Systems, 2017, pp. 3856– 3866.
Caps. Nets - Theory • Solves the “Picasso Problem” • Caps. Nets replace scalar-output feature detectors with vector-output capsules and max-pooling with routing-by-agreement. • Because capsules are independent, when multiple capsules agree, the probability of correct detection is much higher S. Sabour, N. Frosst, and G. E. Hinton, “Dynamic routing between capsules, ” in Advances in Neural Information Pro- cessing Systems, 2017, pp. 3856– 3866.
Caps. Nets - Theory
Caps. Nets - Theory • Solves the “Picasso Problem” • Caps. Nets replace scalar-output feature detectors with vector-output capsules and max-pooling with routing-by-agreement. • Because capsules are independent, when multiple capsules agree, the probability of correct detection is much higher
Caps. Nets - Theory
Caps. Nets Architecture
Caps. Nets Architecture Bottleneck
Baseline – CNN-(RNN)Attention C. Bartz, T. Herold, H. Yang, and C. Meinel, “Language identification using deep convolutional recurrent neural net- works, ” in International Conference on Neural Information Processing Springer, 2017, pp. 880– 889.
Non-Class Detection • Verification Step. • Is Caps. Net more robust* than baseline? • Thresholding Mechanism *robustness here is over several languages.
Results
Results
Results
Results Caps. Net is bad. Or is it?
Results ROC Curve & AUC Score for Languages (1 -5) by Caps. Net (with Midcaps Layers) – 5 second audio input. Left : Test Data. Right : Train Data.
Results ROC Curve & AUC Score for Languages (1 -5) by Bi-GRU 10 second audio input. Left : Test Data. Right : Train Data.
Results See a pattern?
Results See a pattern? HI -> AR - > BE ~ CH > EN
Results See a pattern? HI -> AR - > BE ~ CH > EN Should we be expecting this order?
Results Non Class Detection for languages (6 -10)
Results Why is Japanese low? Why is Punjabi high? Portuguese & Spanish Turkish ~ Arabic Non Class Detection for languages (6 -10)
Results – Caps. Net is Multilingual S. Toshniwal, T. N. Sainath, R. J. Weiss, B. Li, P. Moreno, E. Weinstein, and K. Rao, “Multilingual speech recognition with a single end-to-end model, ” in 2018 IEEE Interna- tional Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2018, pp. 4904– 4908.
Future Work • Recurrent Layers with Caps. Net • NAS over Caps. Net architectures • Non Class Detection
Thank You
- What is linguistic repertoire
- Language choice in multilingual communities ppt
- Presumptive identification vs positive identification
- Multilingual e learning
- Multilingual app toolkit
- Multilingual computing
- Fertile crescent
- Mls screener
- Linguistic diversity definition ap human geography
- Multilingual product information
- Linguistic varieties and multilingual nations
- Rasa multilingual
- Pie multilingual services
- Multilingual service desk
- Language
- Multilingual database design
- Multilingual
- Cognitive search engine
- Multilingual teaching methods
- Multilingual teaching methods
- Qmazon
- Multilingual technical support
- Multilingual semantical markup
- Multilingual nations
- O que é caps
- Sig e caps
- Penulisan resep salep
- Sig e caps
- Curriculum management hod management plan template
- Maps and plans grade 12 maths lit
- Sigcaps
- Fiu caps
- Arti adsp dalam resep
- Sigecaps
- Caps document intermediate phase mathematics
- History fet caps
- Caps 71-1