Visual Question Answering Aaron Honculada Aisha Urooj Dr
- Slides: 24
Visual Question Answering Aaron Honculada Aisha Urooj Dr. Mubarak Shah, Dr. Niels Lobo
TVQA Dataset • 460 hours of video • 152, 545 Question and Answer Pairs • 21, 793 clips (60 -90 sec) • Multimodal Compositionality • Video-QA • Associated natural language (subtitles)
Questions • Main Question part • Grounding part • Temporal Localization • Each clip has 7 questions • Each question has 5 multiple choice answers
TVQA
TVQA • Subtitles • Visual Concepts • Object detection • Concatenate • Remove duplicates • Video Features • Res. Net
Model Used
Baseline Models • LSTM • Bi. LSTM
Baseline Models • Baseline CNN+LSTM
Results Model Used TVQA + S Accuracy (%) Reported 65. 15% Replication 65. 74%
Results Model Used TVQA + S TVQA + V Accuracy (%) Reported 65. 15% 45. 03% Replication 65. 74% 45. 25%
Results Model Used TVQA + S TVQA + V TVQA + IMG Accuracy (%) Reported 65. 15% 45. 03% 43. 78% Replication 65. 74% 45. 25% 44. 42%
Results Model Used TVQA + S TVQA + V TVQA + IMG TVQA + V + IMG Accuracy (%) Reported 65. 15% 45. 03% 43. 78% N/A Replication 65. 74% 45. 25% 44. 42% 45. 52%
Results Model Used TVQA + S TVQA + V TVQA + IMG TVQA + V + IMG Accuracy (%) Reported 65. 15% 45. 03% 43. 78% N/A Replication 65. 74% 45. 25% 44. 42% 45. 52% Q LSTM 42. 74% Bi. LSTM 42. 48%
Results Model Used TVQA + S TVQA + V TVQA + IMG TVQA + V + IMG Accuracy (%) Reported 65. 15% 45. 03% 43. 78% N/A Replication 65. 74% 45. 25% 44. 42% 45. 52% Q S+Q LSTM 42. 74% 42. 71% Bi. LSTM 42. 48% 42. 67%
Results Model Used TVQA + S TVQA + V TVQA + IMG TVQA + V + IMG Accuracy (%) Reported 65. 15% 45. 03% 43. 78% N/A Replication 65. 74% 45. 25% 44. 42% 45. 52% Q S+Q V+Q LSTM 42. 74% 42. 71% 42. 61% Bi. LSTM 42. 48% 42. 67%
Results Model Used TVQA + S TVQA + V TVQA + IMG TVQA + V + IMG Accuracy (%) Reported 65. 15% 45. 03% 43. 78% N/A Replication 65. 74% 45. 25% 44. 42% 45. 52% Q S+Q V+Q S+V+Q LSTM 42. 74% 42. 71% 42. 61% 42. 39% Bi. LSTM 42. 48% 42. 67% 42. 84%
Results Model Used TVQA + S TVQA + V TVQA + IMG TVQA + V + IMG Accuracy (%) Reported 65. 15% 45. 03% 43. 78% N/A Replication 65. 74% 45. 25% 44. 42% 45. 52% Q S+Q V+Q (FC) V + Q S+V+Q LSTM 42. 74% 42. 71% 42. 61% 42. 85% 42. 39% Bi. LSTM 42. 48% 42. 67% 42. 86% 42. 84%
Results
Results
Results
Results
Results
Results
Summary and Next Steps • Reproduced Results • Baseline Results • Look into network mistakes and address them • Main Goal: Boost Performance Using Visual Cues effectively
- Urooj khan mdc
- Open source question answering system
- Chapter 30 seeking employment
- Question
- Aisha moore
- Aisha holloway
- Aisha yousuf
- Aisha bint abu bakr quotes
- Aisha dawood
- Aisha the superhero princess
- Aisha the superhero princess
- La storia di aisha
- Aisha omer
- Aisha nurtabina
- Aisha lone
- Ahmad zahir kite runner
- Zahiruddin babur
- Amer col ent
- Shakira aisha
- Year one princess
- It is a poem that tells a story
- Aisha saif age
- Mian ali haider
- Aisha walcott
- Aisha dawood