Predicting Lymph Node Metastasis Using Histopathological Images Based

Predicting Lymph Node Metastasis Using Histopathological Images Based on Multiple Instance Learning With Deep Graph Convolution Seminar: Deep Learning for Medical Applications Student: Yawei Li

Outline • Weakly Supervised Learning • Problem Settings • Proposed Method • Implementation Details • Discussion and Conclusion Computer Aided Medical Procedures Slide 2

Introduction to Weakly Supervised Learning Computer Aided Medical Procedures Slide 3

Examples of Weakly Supervised Learning Computer Aided Medical Procedures Slide 4

Multiple Instance Learning (MIL) Bag of instances Whole slide image (WSI): 100, 000 x 5, 000 Positve … Computer Aided Medical Procedures • 1 WSI = 1 bag = n instances • Only bag-level labels • Goal: bag-level classification Slide 5

A Naïve Approach How can we learn a model with only bag-level labels? Computer Aided Medical Procedures Slide 6

Method Overview Figure from [1] • VAE extract instance-level features • FS abandons redundant features • GCN aggregates features and learns a bag-level representation Computer Aided Medical Procedures Slide 7

VAE-GAN Figure from [1] Computer Aided Medical Procedures Slide 8

Feature Selection Figure from [1] Hyper-parameters: • Number of bins • Threshold of MMD distance Computer Aided Medical Procedures Slide 9

Aggregation of Instance-level Features Hyper-parameters: • Threshold of distances of nodes • GCN related hyper-parameters Computer Aided Medical Procedures Slide 10

Implementation • Dataset: – 425 WSI – 174 positives; 251 negatives • Architecture – VAE-GAN: • Encoder: Res. Net-18 • Decoder: 5 x (Transpose. Conv + BN + Re. LU) • Discriminator: 5 x (Conv + BN + Leaky. Re. LU) – Feature selection: • Histogram bins: 50 • Feature slection rate: 0. 5 – GCN: • 3 x (GCN + Re. LU) • Self-attention graph pooling • Classifier: 2 x FC • Binary cross entropy Computer Aided Medical Procedures Slide 11

Experiments • • • Train-test split ratio: 5: 1 5 -fold CV Resize instances to 128 x 128 Res. Net + Voting, WISA have extra requirements: • Discriminative info in instance-level • Existance of key instance strongly related to bag labels. Computer Aided Medical Procedures Slide 12

Ablation Study • Two-stage: (A), (B), (C), (D): freeze backbone (VAE-GAN or Res. Net) • One-stage: (E); jointly fine-tune backbone with GCN • (A) is the best setting • (D) vs. (E): two-stage better • (A) vs. (B): FS is necessary Computer Aided Medical Procedures Slide 13

Conclusion and My own Thoughts • Pros: – Train feature extractor in a self-supervised manner – Light-weight FS method – GCN learns better representations than average pooling or softmax • Cons: – FS is sensitive to number of bins and MMD threshold – Not consider spatial location of instances. (GCN is permutation invariant) – Graph construction is sensitive to distance threshold • Other options: – Two stage: • Deeper CNN as feature extractor • Use self-supervised training: Rot. Net [2], Mo. Co[3] – One stage: • Vision Transformer (Vi. T) [4] • Each instance is a token. • Spatial Enoding Computer Aided Medical Procedures Slide 14

Thank you for your attention Question? Computer Aided Medical Procedures Slide 15

References [1] Y. Zhao, F. Yang, Y. Fang, H. Liu, N. Zhou, J. Zhang, J. Sun, S. Yang, B. Menze, X. Fan et al. , “Predicting lymph node metastasis using histopathological images based on multiple instance learning with deep graph convolution, ” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 4837– 4846. [2] Spyros Gidaris, Praveer Singh, and Nikos Komodakis. Unsupervised representation learning by predicting image rotations. ar. Xiv preprint ar. Xiv: 1803. 07728, 2018. [3] Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. Momentum contrast for unsupervised visual representation learning. ar. Xiv preprint ar. Xiv: 1911. 05722, 2019. [4] Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. An image is worth 16 x 16 words: Transformers for image recognition at scale. ar. Xiv preprint ar. Xiv: 2010. 11929, 2020. Computer Aided Medical Procedures Slide 16