Viennas largest monthly event on Deep Learning AI

The Organizers: Thomas Lidy Musimap Alex Schindler AIT & TU Wien René Donner contextflow

VDLM Youtube Channel www. youtube. com/Vienna. Deep. Learning. Meetup

VDLM on Github ➔ ➔ Talks Slides Videos Wiki with beginner’s resources github. com/vdlm/meetups

www. meetup. com/Vienna-Deep-Learning-Meetup

We. Are. Developers AI Congress Vienna December 2018 Bias in Natural Language Processing Navid

Agenda § Episode 1 Deep Learning and Challenges § Episode 2 Word Embedding —

What We Talk About When We Talk About Deep Learning § Neural Networks are

What We Talk About When We Talk About Deep Learning § Deep Learning models

Language and Speech Processing Computer Vision Machine Translation, search engines, dialog systems Face recognition,

Beyond Deep Learning § Safety, Security and adversarial attacks § Uncertainty - What we

Bias in Machine Translation same gender-neutral pronoun

Decomposing a Neural Network Architecture A “basic” neural machine translation architecture word embedding Word

Word Embedding § Representing each word with a vector with d dimensions § Dimensions

Word embedding projected in two-dimensional space

word 2 vec A neural word embedding algorithm

Intuition for Computational Linguistics “In most cases, the meaning of a word is its

Intuition for Computational Linguistics “You shall know a word by the company it keeps!”

dri d e r c a s nk alcohol ic Tesgüino beverage fermented o

le t t o b a i r a Bav Märzen brew dri nk

Tesgüino ←→ Märzen Algorithmic intuition: Two words are related when they share many context

Word Embedding with Neural Networks Recipe for creating word embedding with neural networks §

Neural Word Embedding Architecture Train sample: (Tesgüino, drink) Input Layer (One-hot encoder) Forward pass

Märzen Tesgüino Word vector Context vector

drink Tesgüino Word vector Context vector Märzen

Neural Word Embedding - Summary § Normalization is too expensive!

word 2 vec (Skip. Gram) with Negative Sampling sigmoid

word 2 vec with Negative Sampling – Objective Function § Training Samples Negative Samples

drink Tesgüino - Train sample: (Tesgüino, drink) Word vector Context vector

drink Tesgüino - Train sample: (Tesgüino, drink) - Sample k negative context words Word

drink Tesgüino Word vector Context vector

Gender Bias Quantification encoded in word embedding

Manager she he Housekeeper Word vector Context vector Nurse

she Nurse Housekeeper Manager he Word vector Context vector

Gender Bias in Wikipedia § The bias of 350 occupations to female/male in the

Conclusion § Episode 1 - Challenges in Deep Learning - Data bias in NLP

Thanks! www. navid-rekabsaz. com @navidrekabsaz

Slides: 43

Download presentation

Vienna’s largest monthly event on Deep Learning & AI www. meetup. com/Vienna-Deep-Learning-Meetup

The Organizers: Thomas Lidy Musimap Alex Schindler AIT & TU Wien René Donner contextflow Jan Schlüter OFAI & UTLN www. meetup. com/Vienna-Deep-Learning-Meetup

VDLM Youtube Channel www. youtube. com/Vienna. Deep. Learning. Meetup

VDLM on Github ➔ ➔ Talks Slides Videos Wiki with beginner’s resources github. com/vdlm/meetups

www. meetup. com/Vienna-Deep-Learning-Meetup

We. Are. Developers AI Congress Vienna December 2018 Bias in Natural Language Processing Navid Rekabsaz Idiap Research Institute www. navid-rekabsaz. com @navidrekabsaz

Agenda § Episode 1 Deep Learning and Challenges § Episode 2 Word Embedding — word 2 vec Algorithm § Episode 3 Gender Bias Quantification in Word Embedding

What We Talk About When We Talk About Deep Learning § Neural Networks are non-linear functions with many parameters § Usually trained to maximize likelihood § Generally optimized using variants of Stochastic Gradient Descent (SGD) Adopted from http: //mlss. tuebingen. mpg. de/2017/speaker_slides/Zoubin 1. pdf

What We Talk About When We Talk About Deep Learning § Deep Learning models the overall function as a composition of functions (layers) § With several algorithmic and architectural innovations (dropout, LSTM, Conv. Net, Attention, GANs) § Backed by large datasets, large-scale computational resources, and enthusiasm from academia and industry! Adopted from http: //mlss. tuebingen. mpg. de/2017/speaker_slides/Zoubin 1. pdf

Language and Speech Processing Computer Vision Machine Translation, search engines, dialog systems Face recognition, object and handwriting detection Information Systems Self-driving cars Personalized recommendation systems, search engines Scientific Data Analysis Medicine, Astronomy, Energy Adopted from http: //mlss. tuebingen. mpg. de/2017/speaker_slides/Zoubin 1. pdf Financial Prediction

Beyond Deep Learning § Safety, Security and adversarial attacks § Uncertainty - What we know that we don’t know § Interpretability § Bias - Data - End user, annotator - Algorithm § § § Fairness and transparency Data and model ownership AI Democratization Resource consumption … https: //blog. openai. com/adversarial-example-research/

Bias in Machine Translation same gender-neutral pronoun

Decomposing a Neural Network Architecture A “basic” neural machine translation architecture word embedding Word Embedding is a crucial building block of (almost) every neural NLP system

Word Embedding § Representing each word with a vector with d dimensions § Dimensions reflect the concepts in language

Word embedding projected in two-dimensional space

Learning Word Embedding Black-box

word 2 vec A neural word embedding algorithm

Intuition for Computational Linguistics “In most cases, the meaning of a word is its use. ” Ludwig Wittgenstein, Philosophical Investigations (1953)

Intuition for Computational Linguistics “You shall know a word by the company it keeps!” J. R. Firth, A synopsis of linguistic theory 1930– 1955 (1957)

dri d e r c a s nk alcohol ic Tesgüino beverage fermented o c i x Me out of corn bo ttle Nida[1975]

le t t o b a i r a Bav Märzen brew dri nk bar alcohol ic bit te r Oktoberfest lager

Tesgüino ←→ Märzen Algorithmic intuition: Two words are related when they share many context words

Word Embedding with Neural Networks Recipe for creating word embedding with neural networks § No panic!!! Figuers come next!

Neural Word Embedding Architecture Train sample: (Tesgüino, drink) Input Layer (One-hot encoder) Forward pass Output Layer (Softmax) Backpropagation Linear activation Words matrix https: //web. stanford. edu/~jurafsky/slp 3/ Context Words matrix

Märzen Tesgüino Word vector

Märzen Tesgüino Word vector Context vector

drink Tesgüino Word vector Context vector Märzen

Neural Word Embedding - Summary § Normalization is too expensive!

word 2 vec (Skip. Gram) with Negative Sampling sigmoid

word 2 vec with Negative Sampling – Objective Function § Training Samples Negative Samples

drink Tesgüino - Train sample: (Tesgüino, drink) Word vector Context vector

drink Tesgüino - Train sample: (Tesgüino, drink) - Sample k negative context words Word vector Context vector

drink Tesgüino Word vector Context vector

Gender Bias Quantification encoded in word embedding

Manager she he Housekeeper Word vector Context vector Nurse

she Nurse Housekeeper Manager he Word vector Context vector

Gender Bias in Wikipedia § The bias of 350 occupations to female/male in the word 2 vec model, created on English Wikipedia

Conclusion § Episode 1 - Challenges in Deep Learning - Data bias in NLP systems § Episode 2 - Semantics and word embedding - word 2 vec algorithm § Episode 3 - Bias quantification - Showing gender bias in Wikipedia text

Thanks! www. navid-rekabsaz. com @navidrekabsaz