Master Seminar Deep Learning for Medical Applications Mix

  • Slides: 33
Download presentation
Master Seminar: Deep Learning for Medical Applications Mix. Match: A Holistic Approach to Semi-Supervised

Master Seminar: Deep Learning for Medical Applications Mix. Match: A Holistic Approach to Semi-Supervised Learning David Berthelot Avital Oliver Nicholas Carlini Nicolas Papernot Presented by: Festina Ismali Tutor: Tariq Mousa Bdair Ian Goodfellow Colin Raffel

Semi-Supervised Learning (SSL) • Unsupervised Learning Train a model with no labeled data available

Semi-Supervised Learning (SSL) • Unsupervised Learning Train a model with no labeled data available Ø Semi-supervised Learning (SSL) Train a model with a small fully labeled dataset and large unlabeled dataset • Supervised Learning Train a model with a fully labeled dataset Ø SSL Objective Improve the learner's performance by utilizing the unlabeled data, alleviating the need for labels Computer Aided Medical Procedures December 26, 2021 Slide 2

When Can Semi-Supervised Learning Work? Ø Certain assumptions need to hold Entropy minimization [11]Olivier

When Can Semi-Supervised Learning Work? Ø Certain assumptions need to hold Entropy minimization [11]Olivier Chapelle, Bernhard Scholkopf, and Alexander Zien. Semi-Supervised Learning. MIT Press, 2006. Computer Aided Medical Procedures December 26, 2021 Slide 3

Problem Statement • By leveraging large collections of labeled data, deep neural networks can

Problem Statement • By leveraging large collections of labeled data, deep neural networks can achieve human level performance • However, in practice creating such large datasets with complete labels is: Ø Tedious and error prone Ø Time consuming Ø Difficult and costly, especially in medical domains since expert knowledge is required Computer Aided Medical Procedures December 26, 2021 Slide 4

Motivation • Many recent Semi-Supervised algorithms use one of the current dominant approaches: Consistency

Motivation • Many recent Semi-Supervised algorithms use one of the current dominant approaches: Consistency regularization • Entropy minimization Traditional regularization On the other hand, Mix. Match combines these approaches and finds a way to use them all in a unified manner, as a result it obtains the following benefits: Ø State-of-the-art results on all standard image benchmarks Ø State-of-the-art results on the PATE-Private Aggregation of Teacher Ensembles framework Computer Aided Medical Procedures December 26, 2021 Slide 5

Related work v Consistency Regularization • Encourages the model to output same predictions of

Related work v Consistency Regularization • Encourages the model to output same predictions of different augmentations for an unlabeled example x, by adding the loss term (first proposal [2, 3]): Ø Π-Model [2]Samuli Laine and Timo Aila. Temporal ensembling for semisupervised learning. In ICLR, 2017. [3]Mehdi Sajjadi, Mehran Javanmardi, and Tolga Tasdizen. Regularization with stochastic transformations and perturbations for deep semi-supervised learning. In Advances in Neural Information Processing Systems, 2016 Computer Aided Medical Procedures December 26, 2021 Slide 6

Related work v Consistency Regularization Ø Mean Teacher [4] • Averages model weights instead

Related work v Consistency Regularization Ø Mean Teacher [4] • Averages model weights instead of predictions Ø Virtual Adversarial Training (VAT) • The applied perturbation to the input is carefully chosen (not stochastic) to maximally change the output class distribution [4]Antti Tarvainen and Harri Valpola. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. Advances in Neural Information Processing Systems, 2017 [5] Takeru Miyato, Shin-ichi Maeda, Shin Ishii, and Masanori Koyama. Virtual adversarial training: a regularization method for supervised and semi-supervised learning. IEEE transactions on pattern analysis and machine intelligence, 2018. Computer Aided Medical Procedures December 26, 2021 Slide 7

Related work v Entropy Minimization • One of the assumptions that SSL relies on

Related work v Entropy Minimization • One of the assumptions that SSL relies on is that the decision boundary should lie in a low-density region, i. e we expect classes to be well separated • This can be achieved by minimizing the entropy of predictions on unlabeled data x: Ø Pseudo-Label • Entropy Minimization is achieved (implicitly) by constructing hard labels from high-confidence predictions and use them as if they were true labels Computer Aided Medical Procedures December 26, 2021 Slide 8

Related work v Traditional Regularization • Imposes a constraint on a model to avoid

Related work v Traditional Regularization • Imposes a constraint on a model to avoid overfitting by using: Weight Decay • L 2 Regularization In addition to these traditional regularizations, in labeled datapoints as a modern regularizer is used another method called Mix. Up Computer Aided Medical Procedures December 26, 2021 Slide 9

Mix. Up • § § Effect of Mix. Up (α = 1) on a

Mix. Up • § § Effect of Mix. Up (α = 1) on a toy problem Green: Class 0 Orange: Class 1 Blue shading indicates p(y = 1|x) [7] Hongyi Zhang, Moustapha Cisse, Yann N. Dauphin, and David Lopez-Paz. mixup: Beyond empirical risk minimization. ar. Xiv preprint ar. Xiv: 1710. 09412, 2017. Computer Aided Medical Procedures December 26, 2021 Slide 10

Mix. Up [12]Shahine Bouabid, Vincent Delaitre. Mixup Regularization for Region Proposal based Object Detectors.

Mix. Up [12]Shahine Bouabid, Vincent Delaitre. Mixup Regularization for Region Proposal based Object Detectors. ar. Xiv: 2003. 02065 v 1 [cs. CV] 4 Mar 2020 Computer Aided Medical Procedures December 26, 2021 Slide 11

Mix. Match: how to combine three approaches 1. Consistency regularization This is introduced by

Mix. Match: how to combine three approaches 1. Consistency regularization This is introduced by augmenting both the labeled and unlabeled data inputs (random horizontal flips and crops) 2. Entropy minimization The use of Label Guessing and Sharpening in the unlabeled data reduces the entropy 3. Traditional regularization Mix. Up introduces a linear relationship between the data points Computer Aided Medical Procedures December 26, 2021 Slide 12

Methodology • Computer Aided Medical Procedures December 26, 2021 Slide 13

Methodology • Computer Aided Medical Procedures December 26, 2021 Slide 13

Methodology v Label Guessing v Sharpening [13]Noah Rubinstein. A fastai/Pytorch implementation of Mix. Match.

Methodology v Label Guessing v Sharpening [13]Noah Rubinstein. A fastai/Pytorch implementation of Mix. Match. Towardsdatascience 17 Jun 2017 Computer Aided Medical Procedures December 26, 2021 Slide 14

Methodology v Summary of Mix. Match label guessing process • Data Augmentation • Label

Methodology v Summary of Mix. Match label guessing process • Data Augmentation • Label Guessing Computer Aided Medical Procedures • Sharpening December 26, 2021 Slide 15

Methodology v Mix. Up in Mix. Match • Mix. Up is applied to both

Methodology v Mix. Up in Mix. Match • Mix. Up is applied to both labeled and unlabeled data with labeled guesses • The original Mix. Up algorithm is modified by adding the second step; this biases the Mix. Up towards the original image: Computer Aided Medical Procedures December 26, 2021 Slide 16

Methodology v Mix. Match algorithm Computer Aided Medical Procedures December 26, 2021 Slide 17

Methodology v Mix. Match algorithm Computer Aided Medical Procedures December 26, 2021 Slide 17

Methodology • Computer Aided Medical Procedures December 26, 2021 Slide 18

Methodology • Computer Aided Medical Procedures December 26, 2021 Slide 18

Experimental Setup v Datasets • Aiming practice for SSL is to treat most of

Experimental Setup v Datasets • Aiming practice for SSL is to treat most of the dataset as unlabeled and use a small portion as labeled data CIFAR-10 • 10 image classes, with 6000 images per class CIFAR-100 • 100 image classes, with 600 images per class SVHN • 10 classes, 1 for each digit • Over 600, 000 digit images STL-10 • 10 classes: airplane, bird, car, cat, deer, dog, horse, monkey, ship, truck Computer Aided Medical Procedures December 26, 2021 Slide 19

Experimental Setup v Evaluation baselines Virtual Adversarial Training Mix. Up (on its own as

Experimental Setup v Evaluation baselines Virtual Adversarial Training Mix. Up (on its own as a baseline) Mean Teacher Pseudo-Label • Mix. Up – modified for SSL; cross-entropy loss is used between the Mix. Upgenerated guess label and the model’s prediction v Training setup Ø Wide Res. Net-28 model is used Ø Adam optimizer Computer Aided Medical Procedures December 26, 2021 Slide 20

Results and Discussion v. CIFAR-10 • Fully supervised training on all 50000 samples achieves

Results and Discussion v. CIFAR-10 • Fully supervised training on all 50000 samples achieves an error rate of 4. 17% Computer Aided Medical Procedures December 26, 2021 Slide 21

Results and Discussion v SVHN • On SVHN+Extra Mix. Match outperformed fully supervised training

Results and Discussion v SVHN • On SVHN+Extra Mix. Match outperformed fully supervised training on SVHN without extra (2. 59% error) for every labeled data amount considered Computer Aided Medical Procedures December 26, 2021 Slide 22

Results and Discussion v. Ablation study All values are error rates on CIFAR-10 with

Results and Discussion v. Ablation study All values are error rates on CIFAR-10 with 250 or 4000 labels Computer Aided Medical Procedures December 26, 2021 Slide 23

Results and Discussion v Privacy-Preserving Learning and Generalization • Accuracy-privacy trade-off achieved by Mix.

Results and Discussion v Privacy-Preserving Learning and Generalization • Accuracy-privacy trade-off achieved by Mix. Match compared to a Virtual Adversarial Training (VAT) baseline on SVHN: test accuracy privacy loss VAT 91. 6% ε = 4. 96 Mix. Match 95. 21 ± 0. 17% ε = 0. 97 Computer Aided Medical Procedures December 26, 2021 Slide 24

Results and Discussion v Conclusions • Mix. Match algorithm combines different paradigms of SSL

Results and Discussion v Conclusions • Mix. Match algorithm combines different paradigms of SSL and achieves significantly better performance than all the current methods on all the baseline datasets by a significant factor • It ensures a better accuracy-privacy tradeoff for differential privacy as it requires significantly lesser data than other methods to achieve similar performance v Future work • Integrate additional semi-supervised learning methods together which components result in effective algorithms • Explore the effectiveness of Mix. Match in other domains Computer Aided Medical Procedures December 26, 2021 Slide 25

Own review and discussion v Strengths • Well structured and organized • Code implementation

Own review and discussion v Strengths • Well structured and organized • Code implementation is accessible • Fair comparison presentation with other methods § What is the effectiveness of this approach regarding medical data? § Why does Mix. Up work so well compared to other types of regularization, i. e why should enforcing linearity in predictions between images help the model? Computer Aided Medical Procedures December 26, 2021 Slide 26

THANK YOU FOR YOUR ATTENTION! Any Questions/Comments?

THANK YOU FOR YOUR ATTENTION! Any Questions/Comments?

References • [1]van Engelen, J. E. , Hoos, H. H. A survey on semi-supervised

References • [1]van Engelen, J. E. , Hoos, H. H. A survey on semi-supervised learning. Mach Learn 109, 373– 440 (2020). • [2]Samuli Laine and Timo Aila. Temporal ensembling for semisupervised learning. In ICLR, 2017. • [3]Mehdi Sajjadi, Mehran Javanmardi, and Tolga Tasdizen. Regularization with stochastic transformations and perturbations for deep semi-supervised learning. In Advances in Neural Information Processing Systems, 2016 • [4]Antti Tarvainen and Harri Valpola. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. Advances in Neural Information Processing Systems, 2017 • [5] Takeru Miyato, Shin-ichi Maeda, Shin Ishii, and Masanori Koyama. Virtual adversarial training: a regularization method for supervised and semi-supervised learning. IEEE transactions on pattern analysis and machine intelligence, 2018. • [6] Dong-Hyun Lee. Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In ICML Workshop on Challenges in Representation Learning, 2013 • [7] Hongyi Zhang, Moustapha Cisse, Yann N. Dauphin, and David Lopez-Paz. mixup: Beyond empirical risk minimization. ar. Xiv preprint ar. Xiv: 1710. 09412, 2017 Computer Aided Medical Procedures December 26, 2021 Slide 28

References • [8]Zagoruyko, Sergey and Komodakis, Nikos. Wide residual networks. In Proceedings of the

References • [8]Zagoruyko, Sergey and Komodakis, Nikos. Wide residual networks. In Proceedings of the British Machine Vision Conference (BMVC), 2016. • [9]Avital Oliver, Augustus Odena, Colin Raffel, Ekin Dogus Cubuk, and Ian Goodfellow. Realistic evaluation of deep semi-supervised learning algorithms. In Advances in Neural Information Processing Systems, pages 3235– 3246, 2018 • [10] Nicolas Papernot, Martín Abadi, Ulfar Erlingsson, Ian Goodfellow, and Kunal Talwar. Semisupervised knowledge transfer for deep learning from private training data. ar. Xiv preprint ar. Xiv: 1610. 05755, 2016 • [11] Olivier Chapelle, Bernhard Scholkopf, and Alexander Zien. Semi-Supervised Learning. MIT Press, 2006 • [12] Shahine Bouabid, Vincent Delaitre. Mixup Regularization for Region Proposal based Object Detectors. ar. Xiv: 2003. 02065 v 1 [cs. CV] 4 Mar 2020 • [13] Noah Rubinstein. A fastai/Pytorch implementation of Mix. Match. Towardsdatascience 17 Jun 2017 • [14] Vikas Verma, Alex Lamb, Juho Kannala, Yoshua Bengio, and David Lopez-Paz. Interpolation consistency training for semi-supervised learning. ar. Xiv preprint ar. Xiv: 1903. 03825, 2019. Computer Aided Medical Procedures December 26, 2021 Slide 29

EXTRA SLIDES

EXTRA SLIDES

Experimental Setup v Training setup Wide Res. Net-28 model is used • Depth: 28(layers)

Experimental Setup v Training setup Wide Res. Net-28 model is used • Depth: 28(layers) • Batch normalization and leaky Re. LU nonlinearities • Width: 2(double the feature maps of the original feature Res. Net model) [8] Zagoruyko, Sergey and Komodakis, Nikos. Wide residual networks. In Proceedings of the British Machine Vision Conference (BMVC), 2016. [9]Avital Oliver, Augustus Odena, Colin Raffel, Ekin Dogus Cubuk, and Ian Goodfellow. Realistic evaluation of deep semi-supervised learning algorithms. In Advances in Neural Information Processing Systems, pages 3235– 3246, 2018 Computer Aided Medical Procedures December 26, 2021 Slide 31

Experimental Setup v Training setup Structure of wide residual networks: • • k determines

Experimental Setup v Training setup Structure of wide residual networks: • • k determines the number of width (scales the width of residual blocks) N the number of blocks in a group Filter size is 3 x 3 B(3, 3) denotes a residual block with 3 x 3 convolutional layers o Adam optimizer for training o Weight decay of 0. 0004 at each update for the Wide Res. Net-28 model o Checkpoint every 2 16 training samples and report the median error rate of the last 20 checkpoints Computer Aided Medical Procedures December 26, 2021 Slide 32

Interpolation Consistency Training [14] • A special case in the ablation study where only

Interpolation Consistency Training [14] • A special case in the ablation study where only unlabeled mixup is used, no sharpening is applied and EMA parameters are used for label guessing [14] Vikas Verma, Alex Lamb, Juho Kannala, Yoshua Bengio, and David Lopez-Paz. Interpolation consistency training for semi-supervised learning. ar. Xiv preprint ar. Xiv: 1903. 03825, 2019. Computer Aided Medical Procedures December 26, 2021 Slide 33