Locality Sensitive Hashing and Large Scale Image Search

The problem • Large scale image search: – We have a candidate image –

Large Scale Image Search in Database • Find similar images in a large database

Internet Large scale image search Internet contains billions of images Search the internet The

Large scale image search • Representation must fit in memory (disk too slow) •

Requirements for image search • Search must be both fast, accurate and scalable to

Categorization of existing large scale image search algorithms • Tree Based Structure – Spatial

Tree Based Structure • Kd-tree – The kd-tree is a binary tree in which

Locality Sensitive Hashing • Hashing methods to do fast Nearest Neighbor (NN) Search •

Binary Small Code 1110101010 • • Binary? – 010101010101 – Only use binary code

Detail of these algorithms 1. Locality sensitive hashing – Basic LSH – LSH for

1. Locality Sensitive Hashing • The basic idea behind LSH is to project the

LSH functions for dot products The hashing function of LSH to produce Hash Code

1. Locality Sensitive Hashing • Take random projections of data • Quantize each projection

How to search from hash table? A set of data points N Hash function

Could we improve LSH? • Could we utilize learned metric to improve LSH? •

How to learn distance metric? • First assume we have a set of domain

LSH functions for learned metrics • Given learned metric with • G could be

Some results for LSH • Caltech-101 data set • Goal: Exemplar-based Object Categorization –

Results: object categorization [CORR] Caltech-101 database Kristen Grauman et al ML = metric learning

Question ? • Is Hashing fast enough? • Is sub-linear search time fast enough?

NO! • Small binary code could do better. • Cast an image to a

Binary Small code • First introduced in text search/retrieval • [3] introduced it for

Semantic Hashing. Ruslan Salakhutdinov and Geoffrey Hinton International Journal of Approximate Reasoning, 2009 Query

Semantic Hashing • Similar points are mapped into similar small code • Then store

Overall Query Scheme Image 1 Binary code <10μs Retrieved images Query Image <1 ms

Searching Framework • Produce binary code (01010011010) • Store these binary code into the

The problem is reduced to how to learn small binary code • Simplest method

1. Simple Binarization Strategy Set threshold (unsupervised) - e. g. use median 0 1

2. Locality Sensitive Hashing • LSH is ready to generate binary code (unsupervised •

3. RBM [3] to generate code • Not going into detail, see [3] for

Label. Me retrieval • Label. Me is a large database with human annotated images

Fergus et al Examples of Label. Me retrieval • 12 closest neighbors under different

Test set 2: Web images • 12. 9 million images from tiny image data

Examples of Web retrieval • 12 neighbors using different distance metrics Fergus et al

Web images retrieval Observation: more codes get better performance

4. Spectral hashing Y. Weiss, A. Torralba, and R. Fergus. Spectral Hashing. In NIPS,

Spectral Hashing • To simplify the problem, first assume that the items have already

Some definition • Let be the list of code words (binary vectors of length

Objective function • the average Hamming distance between similar points is minimal • What

Objective of Spectral Hashing the average Hamming distance between similar neighbors in the Euclidean

Graph illustration Nearby points Near with each other Euclidean Space Hamming space

Spectral Relaxation • We obtain an easy problem whose solutions are simply the k

Problem? • Only tells us how to compute the code representation of items in

Recall the problem • Compute the eigenvector and eigenvalue of the graph D −

One dimensional eigenfunction • Multi-dimensional eigenfunction is a difficult problem • One dimensional eigenfuntion

Finding independent coordinates • The problem is reduced to find several independent 1 dimensional

The spectral hashing algorithm • Select a setof n data points • Find k

Results for spectral hashing • Synthetic results on uniform distribution • Label. Me retrieval

2 -D uniform Toy Example Comparison Fergus et al

Some results on labelme Observation: spectral hashing get the best performance Fergus et al

Summary • Image search should be – Fast – Accurate – Scalable • Tree

References 1. M. Datar, N. Immorlica, P. Indyk, and V. Mirrokni. Locality. Sensitive Hashing

Slides: 55

Download presentation

Locality Sensitive Hashing and Large Scale Image Search Yunchao Gong UNC Chapel Hill yunchao@cs. unc. edu

The problem • Large scale image search: – We have a candidate image – Want to search a large database to find similar images – Search the internet to find similar images • Fast • Accurate

Large Scale Image Search in Database • Find similar images in a large database Kristen Grauman et al

Internet Large scale image search Internet contains billions of images Search the internet The Challenge: – Need way of measuring similarity between images (distance metric learning) – Needs to scale to Internet (How? )

Large scale image search • Representation must fit in memory (disk too slow) • Facebook has ~10 billion images (1010) • PC has ~10 Gbytes of memory (1011 bits) Budget of 101 bits/image Fergus et al

Requirements for image search • Search must be both fast, accurate and scalable to large data set • Fast – Kd-trees: tree data structure to improve search speed – Locality Sensitive Hashing: hash tables to improve search speed – Small code: binary small code (010101101) • Scalable – Require very little memory, enabling their use on standard hardware or even on handheld devices • Accurate – Learned distance metric

Categorization of existing large scale image search algorithms • Tree Based Structure – Spatial partitions (i. e. kd-tree) and recursive hyper plane decomposition provide an efficient means to search lowdimensional vector data exactly. • Hashing – Locality-sensitive hashing offers sub-linear time search by hashing highly similar examples together. • Binary Small Code – Compact binary code, with a few hundred bits per image

Tree Based Structure • Kd-tree – The kd-tree is a binary tree in which every node is a k-dimensional point • (No theoretical guarantee !)They are known to break down in practice for high dimensional data, and cannot provide better than a worst case linear query time guarantee.

Locality Sensitive Hashing • Hashing methods to do fast Nearest Neighbor (NN) Search • Sub-liner time search by hashing highly similar examples together in a hash table – Take random projections of data – Quantize each projection with few bits – Strong theoretical guarantees • More detail later

Binary Small Code 1110101010 • • Binary? – 010101010101 – Only use binary code (0/1) • Small? – A small number of bits to code each image – i. e. 32 bits, 256 bits • How could this kind of small code improve the image search speed? More detail later.

Detail of these algorithms 1. Locality sensitive hashing – Basic LSH – LSH for learned metric 2. Small binary code – Basic small code idea – Spectral hashing

1. Locality Sensitive Hashing • The basic idea behind LSH is to project the data into a low-dimensional binary (Hamming) space; that is, each data point is mapped to a b-bit vector, called the hash key. • Each hash function h must satisfy the locality sensitive hashing property: – Where of interest Kristen Grauman et al ∈ [0, 1] is the similarity function Datar, N. Immorlica, P. Indyk, and V. Mirrokni. Locality-Sensitive Hashing Scheme Based on p-Stable Distributions. In SOCG, 2004.

LSH functions for dot products The hashing function of LSH to produce Hash Code is a hyperplane separating the space (next page for example)

1. Locality Sensitive Hashing • Take random projections of data • Quantize each projection with few bits 101 0 Feature vector 1 0 Fergus et al 1 1 0 No learning involved

How to search from hash table? A set of data points N Hash function hr 1…rk Xi Search the hash table for a small set of images << N Hash table Q 110101 hr 1…rk Q 110111 111101 New query [Kristen Grauman et al, modified my me] results

Could we improve LSH? • Could we utilize learned metric to improve LSH? • How to improve LSH from learned metric? • Assume we have already learned a distance metric A from domain knowledge • XTAX has better quantity than simple metrics such as Euclidean distance

How to learn distance metric? • First assume we have a set of domain knowledge • Use the methods described in the last lecture to learn distance metric A • As discussed before, • Thus is a linear embedding function that embeds the data into a lower dimensional space • Define G =

LSH functions for learned metrics • Given learned metric with • G could be viewed as linear parametric function or a linear embedding function for data x Data embedding • Thus the LSH function could be: • The key idea is first embed the data into a lower space by G and then do LSH in the lower dimensional space Jain, B. Kulis, and K. Grauman. Fast Image Search for Learned Metrics. In CVPR, 2008

Some results for LSH • Caltech-101 data set • Goal: Exemplar-based Object Categorization – Some exemplars – Want to categorize the whole data set

Results: object categorization [CORR] Caltech-101 database Kristen Grauman et al ML = metric learning

Question ? • Is Hashing fast enough? • Is sub-linear search time fast enough? • For retrieving (1 + e) near neighbors is bounded by O(n 1/(1+e) ) • Is it fast enough? • Is it scalable enough? (adapt to the memory of a PC? )

NO! • Small binary code could do better. • Cast an image to a compact binary code, with a few hundred bits per image. • Small code is possible to perform real-time searches with millions from the Internet using a single large PC. • Within 1 second! (for 80 million data 0. 146 sec. ) • 80 million data (~300 G) 120 M

Binary Small code • First introduced in text search/retrieval • [3] introduced it for text documents retrieval Semantic Hashing. Ruslan Salakhutdinov and Geoffrey Hinton International Journal of Approximate Reasoning, 2009 • Introduced to computer vision by Antonio Torralba et al [4]. A. Torralba, R. Fergus, and Y. Weiss. Small Codes and Large Databases for Recognition. In CVPR, 2008.

Semantic Hashing. Ruslan Salakhutdinov and Geoffrey Hinton International Journal of Approximate Reasoning, 2009 Query Semantic Hash Function Quite different to a (conventional) randomizing hash Fergus et al Binary code Semantically similar images Address Space Images in database Query address

Semantic Hashing • Similar points are mapped into similar small code • Then store these code into memory and compute hamming distance (very fast, carried out by hardware)

Overall Query Scheme Image 1 Binary code <10μs Retrieved images Query Image <1 ms Store into memory Use hard ware to compute hamming distance Feature vector Fergus et al generate small code Feature vector ~1 ms (in Matlab)

Searching Framework • Produce binary code (01010011010) • Store these binary code into the memory • Use hardware to compute the hamming distance (very fast) • Sort the Hamming distances and get final ranking results

The problem is reduced to how to learn small binary code • Simplest method (use median) • LSH are already able to produce binary code • Restricted Boltzmann Machines (RBM) • Optimal small binary code by spectral hashing

1. Simple Binarization Strategy Set threshold (unsupervised) - e. g. use median 0 1 Fergus et al 0 1

2. Locality Sensitive Hashing • LSH is ready to generate binary code (unsupervised • Take random projections of data • Quantize each projection with few bits 101 0 Feature vector 1 0 Fergus et al 1 1 0 No learning involved

3. RBM [3] to generate code • Not going into detail, see [3] for detail • Use a deep neural network to train small code • Supervised method R. R. Salakhutdinov and G. E. Hinton. Learning a Nonlinear Embedding by Preserving Class Neighbourhood Structure. In AISTATS, 2007.

Label. Me retrieval • Label. Me is a large database with human annotated images • The goal of this experiment is to – First generate small code – Use hamming distance to search for similar images – Sort the results to produce final ranking • Gist descriptor: ground truth

Fergus et al Examples of Label. Me retrieval • 12 closest neighbors under different distance metrics

Test set 2: Web images • 12. 9 million images from tiny image data set • Collected from Internet • No labels, so use Euclidean distance between Gist vectors as ground truth distance • Note: Gist descriptor is a kind of feature widely used in computer vision for

Examples of Web retrieval • 12 neighbors using different distance metrics Fergus et al

Web images retrieval Observation: more codes get better performance

Fergus et al Retrieval Timings

4. Spectral hashing Y. Weiss, A. Torralba, and R. Fergus. Spectral Hashing. In NIPS, 2008. • Closely related to the problem of spectral graph partitioning • What makes a good code? – easily computed for a novel input – requires a small number of bits to code the full dataset – maps similar items to similar binary code words

Spectral Hashing • To simplify the problem, first assume that the items have already been embedded in a Euclidean space • Try to embed the data into a hamming space • Hamming space is binary space 010101001… Fergus et al

Some definition • Let be the list of code words (binary vectors of length k) for n data points • is the affinity matrix characterize similarities between data points.

Objective function • the average Hamming distance between similar points is minimal • What does this objective function mean?

Objective of Spectral Hashing the average Hamming distance between similar neighbors in the Euclidean space The code is binary each bit have 50% to be 0 or 1 the bits to be uncorrelated (bounding condition for the objective)

Graph illustration Nearby points Near with each other Euclidean Space Hamming space

Spectral Relaxation • We obtain an easy problem whose solutions are simply the k eigenvectors of D − W with minimal eigenvalue • Observation: Similar with spectral graph partition • Could be solved by computing generalized eigenvalue problem

Problem? • Only tells us how to compute the code representation of items in the training set • How about the testing set? • Computing the code in the testing set is called the “out-of-sample extension”

Recall the problem • Compute the eigenvector and eigenvalue of the graph D − W • Eigenproblem is computational expensive O(n 3) • Could not handle too large data set • The solution is use eigenfunction

One dimensional eigenfunction • Multi-dimensional eigenfunction is a difficult problem • One dimensional eigenfuntion for simple distribution is well studied! • For example: for 1 D uniform distribution • eigenfunction eigenvalue (1)

Finding independent coordinates • The problem is reduced to find several independent 1 dimensional coordinates that – S 1 S 2 S 3…Sk are single coordinates – Means that the whole distribution could be separated – The same with eigenvectors and eigenvalues

The spectral hashing algorithm • Select a setof n data points • Find k independent coordinates from data • For each coordinate, assume the data distribution are uniform and learn analytical eigenfunction by • Use analytical eigenfunction to learn the eigenvector and eigenvalue for whole data set • Choose top k eigenfunctions from all the eigenvectors learned • Threshold the analytical eigenfunction to obtain binary codes

Results for spectral hashing • Synthetic results on uniform distribution • Label. Me retrieval results using spectral hashing to produce small binary code

2 -D uniform Toy Example Comparison Fergus et al

Some results on labelme Observation: spectral hashing get the best performance Fergus et al

Summary • Image search should be – Fast – Accurate – Scalable • Tree based methods • Locality Sensitive Hashing • Binary Small Code (state of the art)

References 1. M. Datar, N. Immorlica, P. Indyk, and V. Mirrokni. Locality. Sensitive Hashing Scheme Based on p-Stable Distributions. In SOCG, 2004. 2. P. Jain, B. Kulis, and K. Grauman. Fast Image Search for Learned Metrics. In CVPR, 2008. 3. Ruslan Salakhutdinov and Geoffrey Hinton. Semantic Hashing. International Journal of Approximate Reasoning, 2009 4. A. Torralba, R. Fergus, and Y. Weiss. Small Codes and Large Databases for Recognition. In CVPR, 2008. 5. Y. Weiss, A. Torralba, and R. Fergus. Spectral Hashing. In NIPS, 2008.

Questions?