Security Level Huawei CBG AI Challenges Parkhomenko Denis

Security Level: Huawei CBG AI Challenges Parkhomenko Denis, Bankevich Sergey, Korikov Kirill, Bakarov Amir www. huawei. com HUAWEI TECHNOLOGIES CO. , LTD.

Huawei CBG AI Challenges • Computer Vision Challenges • Speech & Language Challenges ØNLU ØASR

How to make neural net lighter? State-of-the-art neural nets are very complex in terms of - calculation - size How to incorporate them in so small chips? Image. Net 1 K validation set accuracy

How to make neural net lighter? Tensor decomposition: Filter quantization, dictionary based convolutions: Target platform optimization: - deep knowledge of CPU/TPU architecture - vectorization, intrinsics, code optimization

Optical character recognition Task 1: Text detection in image Task 2: Text recognition in cropped image Task 3: End-to-end detection+recognition Task 4: Inpaiting Humans are still better there [https: //towardsdatascience. com/image-inpainting-humans-vs-ai-48 fc 4 bca 7 ecc]

Huawei CBG AI Challenges • Computer Vision Challenges • Speech & Language Challenges ØNLU ØASR

Dialogue System

Intent Detection Amazon Alexa Skill Builder Interface

Intent Detection Corpus Word-level vectors Word vector External information: • Word vector • Dictionary • . . . Sentence-level vectors Word-level features: • Entities information • Syntax parsing features • . . . • • Whole-sentence features Rule-based heuristics. . . Neural network Network selection: Ø RNN Ø CNN Ø Attention / Transformer Ø… Sentence vector More networks softmax Intent Classification

NLU Challenges • Deep learning model needs a lot of labeled data • For our skills we could use assessors to generate and classify corpus • But for third-party skills we could rely on provided corpus (usually, tens of samples) • Is it possible to build a good classifier using such a small amount of data?

Challenge • Deep learning model needs a lot of labeled data • For our skills we could use assessors to generate and classify corpus • But for third-party skills we could rely on provided corpus (usually, tens of samples) • Is it possible to build a good classifier using such a small amount of data? Example: Ø 10 samples of labeled data Ø 100 samples of unlabeled data Ø train the model on 100 samples and transfer labels

More challenges • Word sense disambiguation • Cross-lingual transfer • Integration of knowledge graphs to supervised models • Anaphora and coreference resolution • Chit-chatting support • Personalization of conversational agents • …

Huawei CBG AI Challenges • Computer Vision Challenges • Speech & Language Challenges ØNLU ØASR

ASR task • Convert audio input to text output Applications l Voice assistants (phone, home, car) l Recording/voice input transcription l Movie captions

ASR pipeline • Feature extraction • Acoustic model: morphemes/letters • Language model, decoder: text • Postprocessing

ASR components • Support specific input conditions • Language, accent • Close/far field • Deal with noise, multiple people speaking, low volume/quality • Different hardware

ASR components • Support specific input conditions • Provide specific output properties • Normalization • Domains

ASR components • Support specific input conditions • Provide specific output properties • Related and relative tasks • Voice activity detection • Trigger phrase • Direct classification

ASR challenges l Speaker diarization, cocktail party, denoise l Flexible language model l Handling variety of accents l ASR on device l Text normalization l Optimization for production: C/C++, low-level

Thank you www. huawei. com