Predicting Visual Search Targets via Eye Tracking Data

Slides: 1

Predicting Visual Search Targets via Eye Tracking Data I. Purpose & Applications Ø Predict visual search targets in closed and open world settings Manisha Gupta Dr. Ali Borji Fawad Ahmed manishagupta@utexas. edu University of Texas at Austin University of Central Florida III. Gaze Information V. Euclidean Distances Ø Participants’ gaze information was plotted against the collages Ø Size of the fixation points corresponds to the duration Amazon User 1 3 Ø Develop a learning mechanism over an unknown set of targets Euclidean Distance 2, 5 Ø Applications: Implemented in cameras to scan the environment and pinpoint where a lost object (ex: set of keys) is located II. Dataset Ø 5 search targets (shown next to each collage) Ø Yields 600 search tasks 1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Fixation Point *Generated for all 600 search tasks *Repeated for the 20 variations of each collage per search target IV. Convolutional Neural Network Ø *Generated for 5 search tasks for each of the three types of collages Lower distances demonstrates higher similarities between the search target and the fixation point Pre-trained Siamese model • Ø User 1 Extracts 2 x 512 vector features from linear layer 2, 5 Compares and outputs image similarities Ø Pipeline: Search Target Amazon book covers: Distinct structure/Distinct color Search Target 4 Search target (top left) Extracted Book Cover 1 Decision Network Conv. Net Image Similarity Extracted Book Cover 2 Average Euclidean Distance Ø 20 variations of each collage 1, 5 0, 5 Extracted book covers Ø 6 participants searched for a target within a collage 2 2 1, 5 Search Target 1 Search Target 2 Search Target 3 Search Target 4 Search Target 5 1 0, 5 0 Amazon ---- The value of average Euclidean distances and lingering percentages imply that the set of Amazon collages was the most difficult search task. Ø Architecture: Branch Network 1 Image 2 Mugshots: Low structure/Low color Mugshots *Repeated for all six users Extracted Book Cover N O’Reilly book covers: Low structure/Distinct color O'Reilly Collage Type Conv + Re. Lu Max pooling Conv + Re. Lu Max Pooling Un-shared Shared (pseudo-siamese) (siamese) Max pooling Conv + Re. Lu Branch Network 2 Max Pooling VI. Future Work Conv + Re. Lu Decision Network Ø Send feature vector and search targets to a binary SVM Ø Add weights to fixation points to account for gaze duration VII. Acknowledgements Thank you to Dr. Mubarak Shah and Dr. Niels Lobo for their guidance during the program. The research project presented is funded by NSF REU Summer Program 2016.