Analysis of Boolean Functions Fourier Analysis Projections Influence

Analysis of Boolean Functions Fourier Analysis, Projections, Influence, Junta, Etc… And (some) applications Slides prepared with help of Ricky Rosen

Boolean Functions n Def: A Boolean function Power set of [n] Choose the location of -1 Choose a sequence of -1 and 1

Functions as Vector-Spaces n A function can be represented as a string of size 2 n (i. e. : it’s truth table) 11* 1* 1* 1 -1* ** -11* 111* 11 -1* 1 -11* 1 -1 -1* -111* -11 -1* -1* -1 -11* -1 -1* -1 -1 -1* ff 2 n

Functions’ Vector-Space n A functions f is a vector n Addition: n n ‘f+g’(x) = f(x) + g(x) Multiplication by scalar ‘c f’(x) = c f(x) Inner product (normalized)

Boolean function as voting system n Consider n agents, each voting either “for” (T=-1) or “against” (F=1) -1 1 -1 1 -1 1 1 1 -11 n n The system is not necessarily majority. This is a boolean function over n variables.

Voting and influence -1 1 -1 1 -1 1 1 1 -11 n Def: the influence of i on f is the probability, over a random input x, that f changes its value when i is flipped X represented as a set of variables

-1 1 -1 -1 ? 1 -1 1 -1 n n Majority : {1, -1}n {1, -1} The influence of i on Majority is the probability, over a random input x, Majority changes with i this happens when half of the n-1 coordinate (people) vote -1 and half vote 1. i. e.

-1 1 -1 1 -1 1 1 1 -1 1 n Parity : {1, -1}n {1, -1} Always changes the value of parity

-1 1 -1 1 -1 1 1 1 -1 1 n n Dictatorshipi : {1, -1}20 {1, -1} Dictatorshipi(x)=xi influence of i on Dictatorshipi= 1. influence of j i on Dictatorshipi= 0.

Average Sensitivity n n Def: the Average Sensitivityof f (as) is the sum of influences of all coordinates i [n] : as(Majority) = O(n½) as(Parity) = n as(dictatorship) =1

When as(f)=1 Def: f is a balanced function if it equals -1 exactly half of the times: Ex[f(x)]=0 Can a balanced f have as(f) < 1? What about as(f)=1? Beside dictatorships? Prop: f is balanced and as(f)=1 f is a dictatorship.

Representing f as a Polynomial n What would be the monomials over x P[n] ? n All powers except 0 and 1 cancel out! n Hence, one for each character S [n] n These are all the multiplicative functions

Fourier-Walsh Transform n n n Consider all characters Given any function let the Fourier-Walsh coefficients of f be thus f can be described as

Norms Def: Expectation norm on the function Def: Summation norm on the transform Thm [Parseval]: Hence, for a Boolean f

Distribution over Characters n We may think of the Transform as defining a distribution over the characters.

Simple Observations n Claim: n For any function f whose range is {-1, 0, 1}:

Variables` Influence n n Recall: influence of an index i [n] on a Boolean function f: {1, -1}n {1, -1} is Which can be expressed in terms of the Fourier coefficients of f Claim: n And the as:

Average Sensitivity n n Def: the sensitivity of x w. r. t. f is Def: the average-sensitivity of f is

Fourier Representation of influence Proof: consider the influence function which in Fourier representation is and

Restriction and Average Def: Let I [n], x P([n]I), the restriction function is y I [n] x x P[ [n]I ] -1 1 -1 1 -1 1 1 I 1 -1 1

Average function Def: the average function is y x P[ [n]I ] -1 1 -1 1 -1 1 1 Note: I y y I [n] x 1 -1 1

In Fourier Expansion n Prop: The polynomial presentation of FI[x] is in the variables of I (since x P[ [n]I ] is fixed). The coefficient of S I , is the sum of all coefficient of characters that after fixing the values of x P[ [n]I], the only variables remained are those in S.

In Fourier Expansion n n Recall: Since the expectation of a function is its coefficient on the empty character: n Corollary 1: n Corollary 2: P[{i}] = { , {i} } A{i}[x] {-1, 0, 1} Parseval + corollary 1 + the sum of squares of the coefficients of a boolean function equals 1

Expectation and Variance n Recall: n Hence, for any f

Balanced f s. t. as(f)=1 is Dict. n Since f is balanced n So f is linear n For any i s. t. and If s s. t |s|>1 and then as(f)>1 Only i has changed

Codes and Boolean Functions Def: an m-bit code is a subset of the set of all the mbinary string C {-1, 1}m The distance of a code C, which is the minimum, over all pairs of legal-words (in C), of the Hamming distance between the two words A Boolean function over n binary variables, is a 2 n-bit string Hence, a set of Boolean functions can be considered as a 2 n-bits code

Hadamard Code In the Hadamard code the set of legal-words consists of all multiplicative functions. (linear if over {0, 1}) C={ S | S [n]} namely all characters (make sure you know why)

Characters and multiplicative n n Claim: Characters are all the multiplicative functions Proof: F is multiplicative function n Let S={i|f(1, 1, …-1, 1… 1)=-1} ( s=f) i’th place j’th place t’th place

Hadamard Test Given a Boolean f, choose random x and y; check that f(x)f(y)=f(xy) Prop(perfect completeness): a legal Hadamard word (a character) always passes this test

Hadamard Test – Soundness Prop(soundness): Proof: if f(x) f(y)=f(xy), then f(x) f(y) f(xy)=1 else f(x) f(y) f(xy)=-1 x, y[f(x)f(y)f(xy)]= 1 Pr[f(x)f(y)=f(xy)] -1 Pr[f(x)f(y) f(xy)] (1+ )/2 -(1 - )/2= 32

proof

Proof cont. n n Conclusion: Proof 1: (probabilistic method) – consider random variables that with probability are valued And it’s expectation is > then one of its variables > . Proof 2: (algebraic): if then What about ?

Long-Code In the long-code the set of legal-words consists of all monotone dictatorships This is the most extensive binary code, as its bits represent all possible binary values over n elements

Long-Code n n Encoding an element e [n] : Ee legally-encodes an element e if Ee = fe F F T T T

Testing Long-code Def(a long-code list-test): given a code-word f, probe it in a constant number of entries, and n n Completeness (not perfect): accept almost always if f is a monotone dictatorship Soundness: reject w. h. p if f does not have a sizeable fraction of its Fourier weight concentrated on a small set of variables, that is, if a semi-Junta J [n] s. t. Note: a long-code list-test, distinguishes between the case f is a dictatorship, to the case f is far from a junta.

Motivation – Testing Long-code n n The long-code list-test are essential tools in proving hardness results. Hence finding simple sufficient-conditions for a function to be a junta is important.

What about a Hadamar like test? n n n completeness? yes Soundness? We would like something like: Who will pass the test? n n all the characters for start and many more… no

Perturbation Def: denote by the distribution over all subsets of [n], which assigns probability to a subset x as follows: independently, for each i [n], let n n i x with probability 1 - i x with probability

Long-Code Test Given a Boolean f, choose random x and y, and choose z ; check that f(x)f(y)=f(xyz) Prop(completeness): a legal longcode word (a dictatorship) passes this test w. p. 1 -

Long-code Test – Soundness Prop(soundness): Proof: 42

Proof cont. n Try to find k s. t. to get K can be determined according to and the size of the group.

List decoding n n n The test does not give us the ability to list decode the function. Problem: the constant function passes the test and other words whom are not close to a legal code word. Solution (? ): assume f is folded f(x)=-f(-x) for every x. (make sure you understand why this is a “solution”)

Junta Test (1) (2) (3) (4) Definitions Independence test The size test Soundness and completeness of the tests

Definitions: Variation Def: the variation of f (extension of influence) Intuition: if I is very influential in f then the function will go “wild” on y P[I] meaning that the variance ( variation) will be big as well.

Variation cont. Prop: the following are equivalent definitions to the variation of f: Recall

n Recall the variance of f n To get

Proof – Cont. n Recall n Therefore (by Parseval):

High vs Low Frequencies Def: The section of a function f above k is and the low-frequency portion is

Junta Test Def: A Junta test is as follows: A distribution over l queries For each l-tuple, a local-test that either accepts or rejects: T[x 1, …, xl]: {1, -1}l {T, F} s. t. for a j-junta f whereas for any f which is not ( , j)-Junta The test (l) will be polynomial in j/

Fourier Representation of influence Recall: consider the I-average function on P[I] which in Fourier representation is and

Subsets` Influence Recall: The Variation of a subset I [n] on a Boolean function f is and the low-frequency influence

Independence-Test The I-independence-test on a Boolean function f is, for Lemma:

proof

What was I looking for?

Junta Test n The junta-size test JT on a Boolean function f is n Randomly for r>>j 2 partition [n] to I 1, . . , Ir n Run the independence-test t times on each Ih for t>>j 2/ n Accept if no more than j of the Ih fail their independence-tests

Completeness completeness: for a j-junta f only those sets which contain an index of the Junta would fail the independence-test. No more than j sets can fail the test.

Soundness soundness: If f succeeds the test w. p 1/2, Then f is ( , j) - junta Proof: We will prove some claims that will be derived from bounds on the variation of a set who passes the junta test. Intuition: a set, Ih, has probability of ½Variation to fail the test. If a set passes the test t times we will expect that ½Variation < 1/t otherwise after t tests it should had expectedly fail.

n n n Formally (if you insist): The probability of the event of set Ih to fail the test is p=1/2 Variation. The probability of Ih to pass the junta test is (1 -p)t. If it happens w. h. p then e-pt > (1 -p)t > 1/2 -pt > ln(1/2) P < 1/t(-ln(1/2)) < 1/t

Assume the premise. Fix >1/t and let By using the bound on p we will prove that if f passes the test then f is close to J –junta. Prop: if the size test succeeds w. p >1/2, then|J| ≤ j Proof: otherwise, J spreads among Ih w. h. p. and for any Ih s. t. Ih J ≠ it must be that Variation. Ih(f) >

n n n For a random partition, a member of J will be in Ih w. p 1/r. From the birthday problem we know that for r=j 2 we won’t get two members of J in the same I for j variables from J. We will choose such r that w. p ¾ at least j+1 members of J are spread in different Ih’s.

n n n For every such I and for a fixed i : And since I contains a variable from J we conclude that variation. I> Since j 2/ <<t we can choose s. t. >2/t[ln(j+1)+ln 4] Now for any random partition of variables one (like you) can bound the probability that one of the j+1 members of J will pass the size test using union bound: The probability of the size test to succeed is lease then : ¼ + 3/4 1/4 < ½ contradiction to the assumption that the when J does spreads not spread between j+1 I’s and the size test succeeds w. p>1/2 between j+1 I’s test succeeds

Where are we? We concluded that if the size test succeeds w. p > 1/2 then |J|<j n Now what? We will show that almost all the weight of f is concentrated on J. n How ? n (1) Show that the total weight on the high frequencies is small. (2) Show that the total weight of the low frequencies on the characters outside J is small.

(1)High Frequencies Contribute Little Prop: k >> r log r implies Proof: From the Coupon Collector Problem, a character S of size larger than k spreads w. h. p. (>3/4) over all parts Ih (appears in every Ih), hence contributes to the influence of all parts. For this an event: In every Ih member of S (S such that |S|>k)

High frequencies cont. n n Use union bound to bound the probability that one of the j+1 first groups will succeeds in the size test. This probability is less than: 8 j 2/ <t n J 2>ln(j+1)+ln 4 w. p at least 3/4=9/16 the junta test fails. Prob. for spreading over all Ih Prob. That the first j+1 groups fail the size test contradiction

(2)Almost all Weight is on J Lemma: Proof: assume by way of contradiction otherwise since for a random partition w. h. p. (>3/4) ( by a Chernoff like bound – ( i influencei < )) for every h however, since for any I And also

n n n Similar to the last claim, the probability to fail the test in such an event is at least ¾. the test fails w. p > ½ Note: for this union bound t=200 rk/ [ln(j+1)+ln 4] contradiction

Find the Close Junta Now, since consider the (non Boolean) which, if rounded outside J

n n n The distance of f’ (the closest Boolean function to f) is no more than f because it is the minimum distance. According to triangle inequality

applications

First Passage Percolation [BKS] Each edge costs a w/probability ½ and b w/probability ½

First Passage Percolation n n Consider the Grid For each edge e of choose independently we = 1 or we = 2, each with probability ½ This induces a shortest-path metric on Thm : The variance of the shortest path from the origin to vertex v is bounded from above by O( |v|/ log |v|) [BKS] Proof idea: The average sensitivity of shortest-path is bounded by that term

Proof outline n n Let G denote the grid SPG – the shortest path in G from the origin to v. Let denote the Grid which differ from G only on we i. e. flip the value of e in G. Set

Observation If e participates in a shortest path then flipping its value will increase or decrease the SP in 1 , if e is not in SP - the SP will not change.

Proof cont. n And by [KKL] there is at least one variable whose influence is at least (logn/n)

Graph properties Def: A graph property is a subset of graphs invariant under isomorphism. Def: a monotone graph property is a graph property P s. t. n If P(G) then for every super-graph H of G (namely, a graph on the same set of vertices, which contains all edges of G) P(H) as well. P is in fact a Boolean function: 2 V P: {-1, 1}

Examples of graph properties G is connected n G is Hamiltonian n G contains a clique of size t n G is not planar n The clique number of G is larger than that of its complement n The diameter of G is at most s n. . . etc. n n What is the influence of different e on

Erdös–Rényi G(n, p) Graph The Erdös-Rényi distribution of random graphs Put an edge between any two vertices w. p. p

Definitions n n n P – a graph property (P) - the probability that a random graph on n vertices with edge probability p satisfies P. G G(n, p) - G is a random graph of n vertices and edge probability p.

Example-Max Clique n Probability for choosing an edge Consider G G(n, p) Number of vertices n n The size of the interval of probabilities p for which the clique number of G is almost surely k (where k log n) is of order log-1 n. The threshold interval: The transition between clique numbers k-1 and k.

The probability of having a clique of size k is 1 - The probability of having a clique of size k is n n The probability of having a (k + 1)-clique is still small ( log-1 n). The value of p must increase by clog-1 n before the probability for having a (k + 1)clique reaches and another transition interval begins.

Def: Sharp threshold n Sharp threshold in monotone graph property: n The transition from a property being very unlikely to it being very likely is very swift. G satisfies property P G Does not satisfies property P

Thm: every monotone graph property has a Sharp Threshold [FK] n Let P be any monotone property of graphs on n vertices. If p(P) > then q(P) > 1 - for q = p + c 1 log(½ )/logn Proof idea: show asp’(P), for p’>p, is high

Thm [Margulis-Russo]: For monotone f Hence Lemma: For monotone f > 0, q [p, p+ ] s. t. asq(f) 1/ Proof: Otherwise p+ (f) > 1

Proof [Margulis-Russo]:

Mechanism Design Problem n n N agents, each agent i has private input ti T. All other information is public knowledge. Each agent i has a valuation for all items: Each agent wishes to optimize her own utility. Objective: minimize the objective function, the total payment. Means: protocol between agents and auctioneer.

Vickery-Clarke-Groves (VCG) n n n Sealed bid auction A Truth Revealing protocol, namely, one in which each agent might as well reveal her valuation to the auctioneer Whereby each agent gets the best (for her) price she could have bid and still win the auction

Shortest Path using VGC n n n Problem definition: Communication network modeled by a directed graph G and two vertices source s and target t. Agents = edges in G Each agent has a cost for sending a single message on her edge denote by te. Objective: find the shortest (cheapest) path from s to t. Means: protocol between agents and auctioneer.

VCG for Shortest-Path Always in the shortest path 10$ 50$

How much will we pay? n SP 1$ 1$ 2$ 1$ 1$ 1$ 2$ n 1$ 2$ 1$ 1$ 2$ Every agent will get $1 more.

Noise Sensitivity The values of the variables may each, independently, flip with probability n It turns out: one cannot design an f that would be robust to such noise --that is, would, on average, change value w. p. < O(1)-- unless determining the outcome according to very few of the voters n

Juntas n A function is a J-junta if its value depends on only J variables. -1 1 -1 1 -1 1 1 n 1 -1 1 A Dictatorship is 1 -junta -1 1 -1 1 -1 1 1 1 -1

- Noise sensitivity n The noise sensitivity of a function f is the probability that f changes its value when changing a subset of its variables according to the p distribution. -1 1 1 -1 1 -1 1 1 1 -1 -11

Noise sensitivity and juntas Junta -1 1 1 -1 1 -1 1 1 1 -1 -11 Juntas are noise insensitive (stable) Thm [Bourgain; Kindler & S]: Noise insensitive (stable) Boolean functions are Juntas

Noise-Sensitivity – Cont. n n Advantage: very efficiently testable (using only two queries) by a perturbation-test. Def (perturbation-test): choose x~ p, and y~ , p, x, check whether f(x)=f(y) The success is proportional to the noisesensitivity of f. n Prop: the -noise-sensitivity is given by

Relation between Parameters Prop: small ns small high-freq weight Proof: therefore: if ns is small, then Hence the high frequencies must have small weights (as ). Prop: small as small high-freq weight Proof:

High vs. Low Frequencies Def: The section of a function f above k is and the low-frequency portion is

Low-degree B. f are Juntas [KS] Theorem: constant >0 s. t. any Boolean function f: P([n]) {-1, 1} satisfying is an [ , j]-junta for j=O( -2 k 3 2 k) Corollary: fix a p-biased distribution p over P([n]) Let >0 be any parameter. Set k=log 1 - (½) Then constant >0 s. t. any Boolean function f: P([n]) {-1, 1} satisfying is an [ , j]-junta for j=O( -2 k 3 2 k)

Freidgut Theorem Thm: any Boolean f is an [ , j]-junta for Proof: 1. 2. Specify the junta J Show the complement of J has little influence

Long-Code In the long-code the set of legal-words consists of all monotone dictatorships This is the most extensive binary code, as its bits represent all possible binary values over n elements

Long-Code n n Encoding an element e [n] : Ee legally-encodes an element e if Ee = fe F F T T T

Codes and Boolean Functions Def: an m-bit code is a subset of the set of all the m-binary string C {-1, 1}m The distance of a code C is the minimum, over all pairs of legal-words (in C), of the Hamming distance between the two words Note: A Boolean function over n binary variables is a 2 n-bit string Hence, a set of Boolean functions can be considered as a 2 n-bits code

Long-Code Monotone-Dictatorship n In the long-code, the legal code-words are all monotone dictatorships C={ {i} | i [n]} namely, all the singleton characters

Where to go for Dinner? Of course they’ll have to discuss it over The alternatives dinner…. Diners would cast their vote in an (electronic) envelope influence The system would decide – not necessarily according to majority… Power And what if someone (in Florida? ) can flip some votes

Open Questions n n Mechanism Design: show a non truth-revealing protocol in which the pay is smaller (Nash equilibrium when all agents tell the truth? ) Hardness of Approximation: n n n MAX-CUT Coloring a 3 -colorable graph with fewest colors Graph Properties: find sharp-thresholds for properties Analysis: show weakest condition for a function to be a Junta Apply Concentration of Measure techniques to other problems in Complexity Theory

Specify the Junta Set k= (as(f)/ ), and =2 - (k) Let We’ll prove: and let hence, J is a [ , j]-junta, and |J|=2 O(k)

Hadamard Code In the Hadamard code the set of legal-words consists of all multiplicative (linear if over {0, 1}) functions C={ S | S [n]} namely all characters

Hadamard Test – Soundness Prop(soundness): Proof: 113

Testing Long-code Def(a long-code list-test): given a code-word f, probe it in a constant number of entries, and n n accept almost always if f is a monotone dictatorship reject w. h. p if f does not have a sizeable fraction of its Fourier weight concentrated on a small set of variables, that is, if a semi-Junta J [n] s. t. Note: a long-code list-test, distinguishes between the case f is a dictatorship, to the case f is far from a junta.

Motivation – Testing Long-code n n The long-code list-test are essential tools in proving hardness results. Hence finding simple sufficient-conditions for a function to be a junta is important.

High Frequencies Contribute Little Prop: k >> r log r implies Proof: a character S of size larger than k spreads w. h. p. over all parts Ih, hence contributes to the influence of all parts. If such characters were heavy (> /4), then surely there would be more than j parts Ih that fail the t independence-tests

Altogether Lemma: Proof:

Altogether

Beckner/Nelson/Bonami Inequality Def: let T be the following operator on any f, Prop: Proof:

Beckner/Nelson/Bonami Inequality Def: let T be the following operator on any f, Thm: for any p≥r and ≤((r-1)/(p-1))½

Beckner/Nelson/Bonami Corollary 1: for any real f and 2≥r≥ 1 Corollary 2: for real f and r>2

Perturbation Def: denote by the distribution over all subsets of [n], which assigns probability to a subset x as follows: independently, for each i [n], let n n i x with probability 1 - i x with probability

Long-Code Test Given a Boolean f, choose random x and y, and choose z ; check that f(x)f(y)=f(xyz) Prop(completeness): a legal longcode word (a dictatorship) passes this test w. p. 1 -

Long-code Tests n Def (a long-code test): given a codeword w, probe it in a constant number of entries, and accept w. h. p if w is a monotone dictatorship n reject w. h. p if w is not close to any monotone dictatorship n

Efficient Long-code Tests For some applications, it suffices if the test may accept illegal code-words, nevertheless, ones which have short list-decoding: Def(a long-code list-test): given a code-word w, probe it in 2/3 places, and n n accept w. h. p if w is a monotone dictatorship, reject w. h. p if w is not even approximately determined by a short list of domain elements, that is, if a Junta J [n] s. t. f is close to f’ and f’(x)=f’(x J) for all x Note: a long-code list-test, distinguishes between the case w is a dictatorship, to the case w is far from a junta.