Stochastic Separation Theorems or Blessing of dimensionality Gorban
Stochastic Separation Theorems or Blessing of dimensionality Gorban A. N. Joint work with Tyukin I. Y. , University of Leicester
Plan • Geometrical preliminaries, from Gibbs to Levy • Motivation from machine learning • Stochastic separation theorems for equidistributions in high-dimensional balls • Stochastic separation theorems for product distributions in a cube • Stochastic separation by small neuronal ensembles • Testing
Concentration of the volume Almost all volume appears at r≈1 4 n=3 2 n=64 n=128 n= n= 16 2 n= 8 Dim n= o si en 1 = n n Volume of high-dimensional ball is concentrated near its border (sphere) Radius r
Waist concentration In high dimension, the volume of the ball is concentrated near its equator (each equator) Corollary: Two random unit vectors are almost orthogonal almost for sure (the angle between them is close to π/2 with high probability).
Extended reedition of the 1922 book “Lecons d’Analyse Fontionelle”
Extended reedition of the 1922 book “Lecons d’Analyse Fontionelle”
Between Gibbs and Levy • • Levy systematically used asymptotic equivalence of equidistribution on sphere and Gaussian distribution in high dimensions. This is a particular case of equivalence of microcanonical and canonical ensembles (Gibbs, 20 years before Levy). Microcanonical ensemble is a statistical ensemble where the total energy of the system and the number of particles in the system are fixed. It is equidistribution on the isoenergetic surface in Liouville measure. Canonical ensemble is given by Boltzmann’s distribution and maximizes entropy, the temperature is known but the energy is not fixed. These ensembles are equivalent in thermodynamic limit (high dimension) – demonstrated by Gibbs!
Ensemble equivalence n=2 =4 =8 =16 =32 =64 =128 Normalised to unit mean and max
Ensemble equivalence n=2 =4 =8 =16 =32 =64 =128 Probability density has maximum at the origin but the distribution of radius is concentrated far from the origin. Normalised to unit mean and max
Motivation: correction of legacy AI systems
The problem of corrections of legacy systems Inputs Internal signals Outputs A legacy system It works well but sometimes makes mistakes. Corrections are needed!
The problem of corrections of legacy systems Inputs Internal signals Outputs Correction Corrector should take some input, internal, and output signals of the system and produce correction
Corrector should: • be simple; • not change the skills of the legacy system; • allow fast non-iterative learning; • allow correction of new mistakes without destroying of previous corrections. Thus, it has to separate mistakes from correctly solved examples and correct mistakes!
Stochastic separation theorems in high dimensions Is expected ε small in high dimension? ε Extreme points In high dimension, with high probability in an exponentially large random set all points are extreme ones? ? ? !
Stochastic separation for equidistribution in a high-dimensional ball
How to estimate probability of separation one point from M points The probability that a random point belongs to a ε-layer near the border sphere of Bn(1). The probability that M random points from Bn(1) are outside a hemisphere (half-ball) of radius ρ(ε).
How to prove stochastic separation for equi-distribution in a high-dimensional ball
How big could be M for stochastic separation? Or even simpler:
How big could be M – a simple example
Concentration for product distributions
Stochastic separation for product distributions in a cube
Two-neuron separability
Linear versus two-neuron separation of a point from a random set (from equidistribution in a ball) • Blue line - estimate of the probability of separation by two uncorrelated neurons, • Red line - estimate of the probability of linear separation • Dimension n=30 • ε=1/5 Expected ε is not compulsory small in high dimension!
Tests on real-life video pedestrian detection • The legacy system is the VGG-11 convolutional network trained on 114, 000 positive pedestrian and 375, 000 negative non-pedestrian RGB images, resized to 128× 128. • We built Fisher linear discriminant correctors on varying numbers of false positives from the NOTTINGHAM video. • The true positives (totalling to 2896) and the false positives (189) in this video differed from those in the training set. • To construct the covariance matrices, the original training data set was projected onto the first 2000 principal components. • Typical performance of the legacy CNN system with the corrector is shown in Figs below.
The number of false positives removed as a function of the number of false positives the model was built on. Stars indicate the number of false positives used for building the model. Squares correspond to the number of false positives removed by the model.
The number of true positives removed as a function of the number of false positives removed
Knowledge transfer between AIs In our experiments: • The teacher AI, AIt, was modelled by a deep Convolutional Network, Res. Net 18 with circa 11 M trainable parameters. • The teacher network was trained on a “teacher" dataset comprised of 5. 2 M non-pedestrian (negatives), and 600 K pedestrian (positives) images. • The student AI, AIs, was modelled by a linear classier with 2016 trainable parameters. • The values of these parameters were the result of AIs training on • a student" dataset, a sub-sample of the “teacher" dataset comprising of 55 K positives and 130 K negatives, respectively. • This choice of AIs and AIt systems enabled us to emulate interaction between edge-based AIs and their more powerful counterparts that could be deployed on larger servers or computational clouds.
Knowledge transfer between AIs: False Positives induced by the teacher AI These errors contain genuine false positives (images 12, 23 -27) as well as mismatches by size (e. g. 1 -7), and lookalikes (images 8, 11, 13, 15 -17).
Knowledge transfer between Ais: Student’s errors before and after teaching • Red circles show true positives as a function of false positives for the original student linear classier. • Blue stars and Green triangles correspond to AIs after correction of false positives by the teacher (two different algorithms or errors clustering). • Black squares correspond to AIs after correction of false positive followed by correction of false negative.
AUGMENTING ARTIFICIAL INTELLECT: A CONCEPTUAL FRAMEWORK A paraphrase of Douglas C. Engelbart 33
Correctors Agents ` ` ` Network learning Retu ` rn to nt e Stud ` ` the n etwo rk Knowledge interiorisation Supervisor ` 34
References • A. N. Gorban, I. Y. Tyukin, D. V. Prokhorov, K. I. Sofeikov, Approximation with random bases: Pro et Contra, Information Sciences Volumes 364– 365, October 2016, Pages 129– 145, ar. Xiv: 1506. 04631 • A. N. Gorban, I. Y. Tyukin, Stochastic Separation Theorems, Neural Networks 94, October 2017, 255259. ar. Xiv: 1703. 01203 • A. N. Gorban, I. Romanenko, R. Burton, I. Y. Tyukin, One-Trial Correction of Legacy AI Systems and Stochastic Separation Theorems, ar. Xiv: 1610. 00494 • I. Y. Tyukin, A. N. Gorban, K. Sofeikov, I. Romanenko, Knowledge Transfer Between Artificial Intelligence Systems, ar. Xiv: 1709. 01547
The work was supported by Innovate UK Technology Strategy Board (Knowledge Transfer Partnership grants KTP 009890 and KTP 010522).
- Slides: 36