Positive and Negative Randomness Paul Vitanyi CWI University

  • Slides: 20
Download presentation
Positive and Negative Randomness Paul Vitanyi CWI, University of Amsterdam Joint work with Kolya

Positive and Negative Randomness Paul Vitanyi CWI, University of Amsterdam Joint work with Kolya Vereshchagin

Non-Probabilistic Statistics

Non-Probabilistic Statistics

Classic Statistics--Recalled

Classic Statistics--Recalled

Probabilistic Sufficient Statistic

Probabilistic Sufficient Statistic

Kolmogorov complexity K(x)= length of shortest description of x K(x|y)=length of shortest description of

Kolmogorov complexity K(x)= length of shortest description of x K(x|y)=length of shortest description of x given y. A string is random if K(x) ≥ |x|. K(x)-K(x|y) is information y knows about x. Theorem (Mutual Information). K(x)-K(x|y) = K(y)-K(y|x)

Randomness Deficiency

Randomness Deficiency

Algorithmic Sufficient Statistic where model is a set

Algorithmic Sufficient Statistic where model is a set

Algorithmic suficient statistic where model is a total computable function Data is binary string

Algorithmic suficient statistic where model is a total computable function Data is binary string x; Model is a total computable function p ; Prefix complexity is K(p) (size smallest TM computing p); Data-to-model code length l_x(p)=min_d {|d|: p(d)=x. x is typical for p if δ(x|p)=l_x(p)-K(x|p) is small. p is a sufficient statistic for x if K(p)+l_x(p)=K(x)+O(1) and p(d)=x for the d that achieves l_x(p). Theorem: If p is ss for x then x is typical for p. p is minimal ss (sophistication) for x if K(p) minimal.

Graph Structure Function h_x(α) log |S| Lower bound h_x(α)=K(x)-α α

Graph Structure Function h_x(α) log |S| Lower bound h_x(α)=K(x)-α α

Minimum Description Length estimator, Relations between estimators Structure function h_x(α)= min_S{log d(S): x in

Minimum Description Length estimator, Relations between estimators Structure function h_x(α)= min_S{log d(S): x in S and K(S)≤α}. MDL estimator λ_x(α)= min_S{log |S|+K(S): x in S and K(S)≤α}. Best-fit estimator: β_x(α) = min_S {δ(x|S): x in S and K(S)≤α}.

Individual characteristics: More detail, especially for meaningful (nonrandom) Data We flip the graph so

Individual characteristics: More detail, especially for meaningful (nonrandom) Data We flip the graph so that log|. | is on the x-axis and K(. ) is on the y-axis. This is essentally the Rate-distortion graph for list (set) distortion.

Primogeniture of ML/MDL estimators • ML/MDL estimators can be approximated from above; • Best-fit

Primogeniture of ML/MDL estimators • ML/MDL estimators can be approximated from above; • Best-fit estimator cannot be approximated Either from above or below, up to any Precision. • But the approximable ML/MDL estimators yield the best-fitting models, even though we don’t know the quantity of goodnessof-fit ML/MDL estimators implicitly optimize goodness-of-fit.

Positive- and Negative Randomness, and Probabilistic Models

Positive- and Negative Randomness, and Probabilistic Models

Precision of following given function h(α) Data-to-Model cost log |S| h_x(α) Model cost α

Precision of following given function h(α) Data-to-Model cost log |S| h_x(α) Model cost α d

Logarithmic precision is sharp Lemma. Most strings of length n have structure functions close

Logarithmic precision is sharp Lemma. Most strings of length n have structure functions close to the diagonal n-n. Those are the strings of high complexity K(x) > n. For strings of low complexity, say K(x)< n/2, The number of appropriate functions is much greater than the number of strings. Hence there cannot be a string for every such function. But we show that there is a string for every approximate shape of function.

All degrees of neg. randomness Theorem: For every length n there are strings x

All degrees of neg. randomness Theorem: For every length n there are strings x of every minimal sufficient statstic in between 0 and n (up to a log term) Proof. All shapes of the structure function are possible, as long as it starts from n-k and decreases monotonically and is 0 at k for some k ≤ n. (Up to the precision in the previous slide).

Are there natural examples of negative randomness Question: Are there natural examples of strings

Are there natural examples of negative randomness Question: Are there natural examples of strings of with large negative randomness. Kolmogorov didn’t Think they exist, but we know the are abundant. . Maybe information distance between strings x and y yields large negative randomness.

Information Distance: • Information Distance (Li, Vitanyi, 96; Bennett, Gacs, Li, Vitanyi, Zurek, 98)

Information Distance: • Information Distance (Li, Vitanyi, 96; Bennett, Gacs, Li, Vitanyi, Zurek, 98) D(x, y) = min { |p|: p(x)=y & p(y)=x} Binary program for a Universal Computer (Lisp, Java, C, Universal Turing Machine) Theorem (i) D(x, y) = max {K(x|y), K(y|x)} Kolmogorov complexity of x given y, defined as length of shortest binary ptogram that outputs x on input y. (ii) D(x, y) ≤D’(x, y) Any computable distance satisfying ∑ 2 --D’(x, y) ≤ 1 for every x. y (iii) D(x, y) is a metric.

Not between random strings • T The information distance between random strings x and

Not between random strings • T The information distance between random strings x and y of length n doesn’t work. • If x, y satisfy K(x|y), K(y|x) > n then p=x XOR y where XOR means bitwise exclusive-or serves as a program to translate x too y and y to x. But if x and y are positively random it appears that p is so too.

Selected Bibliography N. K. Vereshchagin, P. M. B. Vitanyi, A theory of lossy compression

Selected Bibliography N. K. Vereshchagin, P. M. B. Vitanyi, A theory of lossy compression of individual data, http: //arxiv. org/abs/cs. IT/0411014, Submitted. P. D. Grunwald, P. M. B. Vitanyi, Shannon Information and Kolmogorov complexity, IEEE Trans. Information Theory, Submitted. N. K. Vereshchagin and P. M. B. Vitanyi, Kolmogorov's Structure functions and model selection, IEEE Trans. Inform. Theory, 50: 12(2004), 3265 - 3290. P. Gacs, J. Tromp, P. Vitanyi, Algorithmic statistics, IEEE Trans. Inform. Theory, 47: 6(2001), 2443 -2463. Q. Gao, M. Li and P. M. B. Vitanyi, Applying MDL to learning best model granularity, Artificial Intelligence, 121: 1 -2(2000), 1 --29. P. M. B. Vitanyi and M. Li, Minimum Description Length Induction, Bayesianism, and Kolmogorov Complexity, IEEE Trans. Inform. Theory, IT-46: 2(2000), 446 --464.