Week 6 Presentation Predictive coding Yogesh Tamique de

Review of project concept ● The main idea of this project is to learn

What has been done this week ● Re-coded everything ○ ○ ○ Everything runs

For next week ● Finish running current experiments ● When GPU quota renews, rerun

Slides: 5

Download presentation

Week 6 Presentation Predictive coding (Yogesh) Tamique de Brito

Review of project concept ● The main idea of this project is to learn self-supervised video embeddings by performing predictive coding based on arbitrary times ○ ○ ○ Predictive coding means predicting latent representation Similar work has been done before, but the latent representations are taken from consecutive times and the representation is predicted at the next timestep There are similar approaches to this arbitrary-timestep idea in NLP, but to my knowledge they have not been applied to computer vision. ● The approach is as follows: ○ ○ Take a random group of input clips and a target clip and encode all of them Pair input encodings with (transformer-style) temporal encodings and pass them through an aggregator to obtain a global representation Pair the global representation with the target clip’s temporal encoding (gives temporal context) and pass this through a predictor to get a prediction for latent embedding of the target clip Can also jointly do classification

What has been done this week ● Re-coded everything ○ ○ ○ Everything runs faster now—the efficiency problems of last week have been fixed Workflow is much better Due to Newton time limits, have switched to running experiments on local machine on a subset of data ● Ran some experiments: ○ ○ ○ Single-clip baselines Tested different architectures for aggregator Implemented I 3 D as encoder, will test this soon as well

For next week ● Finish running current experiments ● When GPU quota renews, rerun on Newton with full dataset