NoiseInsensitive BooleanFunctions are Juntas Guy Kindler Muli Safra

Noise-Insensitive Boolean-Functions are Juntas Guy Kindler & Muli Safra 1

Dictatorship Def: a boolean function P([n]) {-1, 1} is a monotone e-dictatorships --denoted fe-if: 2

We would tend to omit p Juntas Def: a boolean function f: P([n]) {-1, 1} is a j-Junta if J [n] where |J|≤ j, s. t. for every x [n]: f(x) = f(x J) Def: f is an [ , j]-Junta if j-Junta f’ s. t. Def: f is an [ , j, p]-Junta if j-Junta f’ s. t. 3

Codes and Boolean Functions Def: a code is a mapping of a set of n elements (log n bits’ string) to a set of m-bits strings C: [n] {0, 1}m, i. e. C(e) = a 1…am Def: Let Sj={e [n] | C(e)j=T} C(1) FF Let ={Sj}j [m] C(2) TF C(3) TT T F F C(n) TT T S 1={2, 3, n} Sm={1, n} 4

Codes and Boolean Functions C(1) FF 0 Def: Let Ee be the encoding C(2) TF of element e. C(3) TT T F F C(n) TT T S 1={2, 3, n} n Sm={1, n} Consider {Ee}e [n] Each Ee’s truth-table represents a legal-code-word of C ( since C(e) = Ee(S 1)…Ee(Sm) ) 5

Long-Code n n n 2 {0, 1} In the long-code L: [n] each element is encoded by an 2 n-bits This is the most extensive code, as = P([n]), i. e. the bits represent all subsets in P([n]) 6

Long-Code n n Encoding an element e [n]: Ee legally-encodes an element e if Ee = fe F F T T T 7

Motivation – Testing Long-code n Def (a long-code test): given a code-word w, probe it in a constant number of entries, and accept w. h. p if w is a monotone dictatorship n reject w. h. p if w is not close to any monotone dictatorship n 8

Motivation – Testing Long-code n Def(a long-code list-test): given a codeword w, probe it in a constant number of entries, and accept w. h. p if w is a monotone dictatorship, n reject w. h. p if w is not even approximately determined by a small list of domain elements, that is, if a Junta J [n] s. t. f is close to f’ and f’(x)=f’(x J) for all x n n Note: a long-code list-test, distinguishes between the case w is a dictatorship, to the case w is far from a junta. 9

Motivation – Testing Long-code n The long-code test, and the long-code list-test are essential tools in proving hardness results. Examples … n Hence finding simple sufficient-conditions for a function to be a junta is important. 10

Background n Thm (Friedgut): a boolean function f with small average-sensitivity is an [ , j]-junta n Thm (Bourgain): a boolean function f with small highfrequency weight is an [ , j]-junta n Thm (Kindler&Safra): a boolean function f with small high-frequency weight in a p-biased measure is an [ , j] -junta n Corollary: a boolean function f with small noisesensitivity is an [ , j]-junta n Parameters: average-sensitivity, high-frequency weight, noise-sensitivity 11

Noise-Sensitivity n Idea: check how the value of f changes when the input is changed not on one, but on several coordinates. z I x [n] 12

Noise-Sensitivity n Def( , p, x[n] ): Let 0< <1, and x P([n]). Then y~ , p, x, if y = (xI) z where n n I~ [n] is a noise subset, and z~ p. I is a replacement. Def( -noise-sensitivity): let 0< <1, then z n Note: I x [n] deletes a coordinate in x w. p. (1 -p), adds a coordinate to x w. p. Hence, when p=1/2: equivalent to flipping each coordinate in x w. p. /2. 13

Noise-Sensitivity – Cont. n n Advantage: very efficiently testable (using only two queries) by a perturbation-test. Def (perturbation-test): choose x~ p, and y~ , p, x, check whether f(x)=f(y). The success is proportional to the noisesensitivity of f. n Prop: the -noise-sensitivity is given by 14

Relation between Parameters n n Prop: small ns small high-freq weight Proof: therefore: if ns is small, then Hence the high frequencies must have small weights (as ). Prop: small as small high-freq weight Proof: 15

Average and Restriction Def: Let I [n], x P([n]I), the restriction function is I y [n] x Def: the average function is y y y I [n] x Note: 16

Fourier Expansion n Prop: n Corollary: 17

Variation Def: the variation of f (formerly called influence): Prop: the following are equivalent definitions to the variation of f: 18

Proof n Recall n n Therefore 19

Proof – Cont. n Recall n Therefore (by Parseval: ( 20

High/Low Frequencies and their Weights Def: the high-frequency portion of f: Def: the low-frequency portion of f: Def: the high-frequency-weight is: Def: the low-frequency-weight is: 22

Low-freq variation and Low-freq average-sensitivity Def: the low-frequency variation is: Def: the average sensitivity is And in Fourier representation: Def: the low-frequency average sensitivity is: 23

Main Results Theorem: constant >0 s. t. any boolean function f: P([n]) {-1, 1} satisfying is an [ , j]-junta for j=O( -2 k 3 2 k). Corollary: fix a p-biased distribution p over P([n]). Let >0 be any parameter. Set k=log 1 - (1/2). Then constant >0 s. t. any boolean function f: P([n]) {-1, 1} satisfying is an [ , j]-junta for j=O( -2 k 3 2 k). 25

First Attempt: Following Freidgut’s Proof Thm: any boolean function f is an [ , j]-junta for Proof: 1. 2. Specify the junta where, let k=O(as(f)/ ) and fix =2 -O(k) Show the complement of J has small variation P([n]) J 26

P([n]) Following Freidgut - Cont Lemma: Proof: J Now, lets bound each argument: Prop: Proof: characters of size k contribute to the average-sensitivity at least (since ) 28

we do not know True only since is a { whether as(f) this is small! -1, 0, 1} function. So we cannot proceed this way with only as k! Following Freidgut - Cont Prop: Proof: 30

Important Lemma n Lemma: >0, s. t. for any and any function g: P([m]) , the following holds: Low-freq high-freq 32

Beckner/Nelson/Bonami Inequality Def: let T be the following operator on any f, Thm: for any p≥r and ≤((r-1)/(p-1))½ Corollary: for f s. t. f>k=0 33

Probability Concentration n n Simple Bound: Proof: n n Low-freq Bound: Let g: P([m]) s. t. g>k=0, let >0, then >0 s. t. Proof: recall the corollary: 35

Lemma’s Proof n n Now, let’s prove the lemma: Bounding low and high freq separately: , Low-freq bound simple bound 36

Shallow Function Def: a function f is linear, if only singletons have non-zero weight Def: a function f is shallow, if f is either a constant or a dictatorship. Claim: boolean linear functions are shallow. n n n weight Character size 0 1 2 3 k n 37

Boolean Linear Shallow n n Claim: boolean linear functions are shallow. Proof: let f be boolean linear function, we next show: 1. 2. {io} s. t. (i. e. ) And conclude, that either i. e. f is shallow or 38

Claim 1 n n Claim 1: let f be boolean linear function, then {io} s. t. Proof: w. l. o. g assume for any z {3, …, n}, consider x 00=z, x 10=z {1}, x 01=z {2}, x 11=z {1, 2} n then. n Next value must be far from {-1, 1}, n A contradiction! (boolean function) n Therefore 1 n ? -1 39

Claim 2 n Claim 2: let f be boolean function, s. t. n Then either or Proof: consider f( ) and f(i 0): 1 0 n n n Then but f is boolean, hence therefore -1 41

Proving FKN: almost-linear close to shallow n Theorem: Let f: P([n]) be linear, n Let n let i 0 be the index s. t. is maximal then n Note: f is linear, hence w. l. o. g. , assume i 0=1, then all we need to show is: We show that in the following claim and lemma. 43

Corollary n n Corollary: Let f be linear, and then a shallow boolean function g s. t. Proof: let , let g be the boolean function closest to l. Then, this is true, as is small (by theorem), n and additionally is small, since n 44

Claim 1 n Claim 1: Let f be linear. w. l. o. g. , assume then global constant c=min{p, 1 -p} s. t. weight Each of weight no more than c Characters {} {1} {2} {i} {n} {1, 2} {1, 3} {n-1, n} S {1, . . , n} 45

Proof of Claim 1 n Proof: assume for any z {3, …, n}, consider x 00=z, x 10=z {1}, x 01=z {2}, x 11=z {1, 2} n then n Next value must be far from {-1, 1} ! n A contradiction! (to ) n 1 ? -1 46

Lemma note n n Lemma: Let g be linear, let assume , then Corrolary: The theorem follows from the combination of claim 1 and the lemma: Let m be the minimal index s. t. n Consider n If m=2: theorem is obtained (by lemma) n Otherwise -- a contradiction to minimality of m: n 48

Lemma’s Proof n Lemma’s Proof: Note n n n Hence, all we need to show is that n Intuition: n n Note that |g| and |b| are far from 0 (since |g| is -close to 1, and c -close to b). Assume b>0, then for almost all inputs x, g(x)=|g(x)| (as ) Hence E[g] E[|g(x)|], and therefore var(g) var(|g|) 49

Proof-map: |g|, |b| are far from 0 g(x)=|g(x)| for almost all x E[g] E[|g|] var(g) var(|g|) n n E 2[g] - E 2[|g|] = 2 E 2[|g|1{f<0}] o( ) (by Azuma’s inequality) We next show var(g) var(|g|): By the premise n however n n therefore 50

Central Ideas: Linear Functions and Random Partition Idea 1: recall n n (theorem’s premise) Assume f is close to linear then f is close to shallow (by [FKN]). I 2 P([n]) I I 1 Ir J Idea 2: Let. n Partition J into I 1, …, Ir. n r is large, hence w. h. p f. I[x] is close to linear (low freq characters intersect each I by 1 element). 51

Variation Lemma I 2 I 1 n n P([n]) I Ir J Lemma(variation): >0, and r>>k s. t. Corollary: for most I and x, f. I[x] is almost constant 52

Using Idea 2 n By union bound on I 1, …, Ir: n n n P([n]) I 2 I I 1 Ir J (set ) Let f’(x) = sign( AJ[f](x J) ). f’ is the boolean function closest to AJ[f], therefore Hence f is an [ , j]-junta. 53

variation-Lemma - Proof Plan Lemma(variation): >0, and r>>k s. t. Sketch for proving the variation lemma: 1. w. h. p f. I[x] is almost linear 2. w. h. p f. I[x] is close to shallow 3. f. I[x] cannot be close to dictatorship too often. 54

Proof-map: 1. w. h. p f. I[x] is almost linear 2. w. h. p f. I[x] is close to shallow 3. f. I[x] cannot be close to dictatorship too often. Lemma Proof The low frequencies characters are small, r is rather large, hence w. h. p the characters are linear at each I. I 2 I 1 P([n]) I Ir J 55

almost linear almost shallow Proof-map: 1. w. h. p f. I[x] is almost linear 2. w. h. p f. I[x] is close to shallow 3. f. I[x] cannot be close to dictatorship too often. Theorem([FKN]): global constant M, s. t. boolean function f, shallow boolean function g, s. t. n Hence, ||f. I[x]>1||2 is small f. I[x] is close to shallow! 56

Preliminary Lemma and Props n Prop: if f. I[x] is a dictatorship, then coordinate i s. t. (where p is the bias). weight Total weight of no more than 1 -p Characters {1} {2} n {i} {n} {1, 2} {1, 3} {n-1, n} S {1, . . , n} Corollary (from [FKN]): global constant M, s. t. boolean function h, either or 57

Few Dictatorships n n n Proof-map: 1. w. h. p f. I[x] is almost linear 2. w. h. p f. I[x] is close to shallow 3. f. I[x] cannot be close to dictatorship too often. Lemma: >0, s. t. for any and any function g: P([m]) , the following holds: Def: Let DI be the set of x P(I), s. t. f. I[x] is a dictatorship, i. e. Next we show, that |DI| must be small, hence for most x, f. I[x] is constant. 58

Prev lemma n n |DI| must be small Lemma: Proof: let Parseval , then Each S is counted only for one index i I. (Otherwise, if S was counted for both i and j in I, then |S I|>1!) 59

Simple Prop n n Prop: let {ai}i I be sub-distribution, that is, i Iai 1, 0 ai, then i Iai 2 maxi I{ai}. Proof: ai no more than 1 1 1 ai 2 3 max n 1/amax 1 1 2 3 n 60

|DI| must be small - Cont n Therefore (since n , ( Hence 61

Obtaining the Lemma n It remains to show that indeed: n Prop 1: • Recall • However n Prop 2: { S}S are orthonormal, and 62

Obtaining the Lemma – Cont. n Prop 3: n Proof: separate by freq: n n Small freq: n Large freq: Corollary(from props 2, 3): 63

Obtaining the Lemma – Cont. n n Recall: by corollary from [FKN], Either or Hence |DI| is small n n By Corollary Combined with Prop 1 we obtain: 64

The End 66

XOR Test n n Let be a random procedure for choosing two disjoint subsets x, y s. t. : i [n], i xy w. p 1/3, i yx w. p 1/3, and i x y w. p 1/3. Def(XOR-Test): Pick <x, y>~ , Accept if f(x) f(y), n Reject otherwise. n 67

Example n n Claim: Let f be a dictatorship, then f passes the XOR-test w. p. 2/3. Proof: Let i be the dictator, then Pr<x, y>~ [f(x) f(y)]=Pr<x, y>~ [i x y]=2/3 Claim: Let f’ be a -close to a dictatorship f, then f’ passes the XORtest w. p. 2/3 – 2/3 ( - 2). Proof: see next slide… 68

69

Local Maximality n n n Def: f is locally maximal with respect to a test, if f’ obtained from f by a change on one input x 0, that is, Pr<x, y>~ [f(x) f(y)] Pr<x, y>~ [f’(x) f’(y)] Def: Let x be the distribution of all y such that <x, y>~. Claim: if f is locally maximal then f(x) = -sign(Ey~ (x)[f(y)]). 70

n n Claim: Proof: immediate from the Fourierexpansion, and the fact that y x= 71

n Conjecture: Let f be locally maximal (with respect to the XOR-test), assume f passes the XOR-test w. p 1/2 + , for some constant >0, then f is close to a junta. 72