Rank Aggregation Methods for the Web CS 728

Rank Aggregation Methods for the Web CS 728 Lecture 11

Web Page Ranking Methods Reviewed • • Page. Rank – global link analysis Indegree – local link analysis HITS- topic-based link analysis Voting –NNN and Correlation Graph distance from seed URL length and depth Text-based methods (e. g. , tf*idf)

Rank Aggregation “Consensus” ranking of all A B D C F E B D C A B C D A F E B D C A F E

Notations for Ranking • Given a universe U, and ordered list τ of a subset of S of U τ=[x 1≥ x 2≥… ≥xd] , xi in S τ(i) : position of rank of i |τ|: number of elements • full list : τ which contains all the elements in U • partial list : rank only some of elements in U • top d list : all d ranked elements are above all unranked elements • Question: when are two orderings similar? Can you give a distance measure?

Measuring Distance Between Orderings • Spearman’s Footrule Distance – σ , τ : two full list. – σ( i ) : rank of candidate i • Kendall tau distance – Count the number of pairwise disagreements between the two lists

Example of Ordered-List Distance • Example – S = {A, B, C, D, E} – σ , τ : two full list 1 2 3 4 5 σ τ A C E D B C A B D E • Spearman’s Footrule Distance – F(σ , τ ) = 1 + 2 + 1 + 0 + 2 = 6 • Kendall tau distance – K(σ , τ ) = |{(A, C), (B. D), (B, E), (D, E)}| = 4

Optimal ranking aggregation • Optimality depends on the distance measure we use. • Optimizing with Kendall tau distance, we obtain Kemeny optimal aggregation • Can show satisfies neutrality and consistency – important properties of rank aggregation functions. • Useful but computationally hard. Kemeny optimal aggregation is NP-hard. • Will show that footrule-optimal is in P.

Two properties relate K and F • For any full lists σ, τ K(σ, τ) ≤ F(σ, τ) ≤ 2 K(σ, τ) So we get a 2 -approximation to Kemeny-optimality • Since, if σ is the Kemeny optimal aggregation of full lists τ1 , …, τk and σ’ optimizes the footrule aggregation then, K(σ’, τ1 , …, τk ) ≤ 2 K(σ, τ1 , …, τk )

Condorcet Criteria and SPAM Filters • Condorcet Criterion – An element of S which wins every other in pairwise simple majority voting should be ranked first. • Extended Condorcet Criterion (XCC): – Version 1: If most voters prefer candidate a to candidate b (i. e. , # of i s. t. i(a) < i(b) is at least n/2), then also should prefer a to b (i. e. , (a) < (b)). – Version 2: If there is a partition (C, D) of S such that for any x in C and y in D the majority prefers x to y, then x must be ranked above y.

XCC (ver 1): Not always realizable a b c b a c c a b (a) < (b) < (c) a b c a c a b Not realizable XCC(ver 2) is effective and can be used in ‘spam-fighting’ and meta-search.

Voting Theory: Desired Properties • Given set of candidates and voter preferences: seek an algorithm that ranks candidates which satisfies a set of desired properties • Which combination of properties are realizable? • 1) Independence from Irrelevant Alternatives: Relative order of a and b in should depend only on relative order of a and b in 1, …, n. – Ex: if i = (a b c) changes to (a c b), relative order of a, b in should not change.

Desired Properties: • 2) Neutrality No candidate should be favored to others. – If two candidates switch positions in 1, …, n, they should switch positions also in . • 3) Anonymity No voter should be favored to others. – If two voters switch their orderings, should remain the same.

Desired Properties: • 4) Monotonicity If the ranking of a candidate is improved by a voter, its ranking in can only improve. • 5) Consistency If voters are split into two disjoint sets, S and T, and both the aggregation of voters in S and the aggregation of voters in T prefer a to b, then also the aggregation of all voters should prefer a to b.

Desired Properties • 6) No Dictatorship: f( 1, …, n) != I • 7) Unanimity (a. k. a. Pareto optimality): If all voters prefer candidate a to candidate b (i. e. , i(a) < i(b) for all i), then also should prefer a to b (i. e. , (a) < (b)).

Desired Properties • 8) Democracy: satisfies extended Condorcet Criterion XCC. – Always works for m = 2. – Not always realizable for m ≥ 3. • Theorem [May, 1952]: For m = 2, Democracy is the only rank aggregation function which is monotone, neutral, and anonymous.

Arrow’s Impossibility Theorem [Arrow, 1951] • Theorem: If m ≥ 3, then the only rank aggregation function that is unanimous and independent from irrelevant alternatives is dictatorship. – Won Nobel prize (1972)

Borda’s method • Easy and intuitive - Several “score-based”variants; 1781 • Violates independence from irrelevant alternatives 1 2 3 4 B(c)= B (c) i i Sorted in decreasing order C 3 C 1. . . C 7 C 8 C 10 Bi(C 8) = 1 C 7 C 1. . . C 8 C 3 C 10 C 3 C 2. . . C 7 C 10 C 9 2 0 C 3 C 8. . . C 15 C 10 13 Bi(c)=the number of candidates ranked below c in i

Partial lists • Handle partial lists by giving all the excess scores equally among all unranked candidates, Example: Candidates number = 100 Ranked candidates number =70 (score: 31~100) =>Assign score 31/30 to each 30 unranked candidates

Footrule optimal aggregation • Footrule optimal aggregation can be computed in polynomial time. is a good approximation of Kemeny optimal aggregation. • Proof : Via minimum cost perfect matching

Markov Chain method for rank aggregation. • States=candidates • Transitions depend on the preference orders given by voters • Basic idea: probabilistically switch to a “better candidate” • Rank candidates based on stationary probabilities!

Markov chain advantages • Handling partial list and top d list by using available comparisons to infer new ones • Handling uneven comparison and list length • Computation efficiency – O(NK) preprocessing, O(K) per step for about O(N) steps

Four ways to build transition Matrix • Current state is candidate a. • MC 1: Choose uniformly from multiset of all candidates that were ranked at least as high as a by some voter. – Probability to stay at a: ~ average rank of a. • MC 2: Choose a voter i uniformly at random and pick uniformly at random from among the candidates that the i-th voter ranked at least as high as a. • MC 3: Choose a voter i uniformly at random and pick uniformly at random a candidate b. If i-th voter ranked b higher than a, go to b. Otherwise, stay in a. • MC 4: Choose a candidate b uniformly at random If most voters ranked b higher than a, go to b. Otherwise, stay in a. – Rank of a ~ # of “pairwise contests” a wins.

Locally Kemeny optimal aggregation • A locally Kemeny optimal aggregation satisfies the extended Condorcet property XCC(2) and can be computed in “k. O(nlogn)” worst case, O(n 2) • Many of existing aggregation methods do not satisfy XCC. Possible to use your favorite aggregation method to obtain a full list. Then apply local kemenization to achieve XCC.

Locally Kemeny optimal • Recall that Kemeny optimal is NP-hard • Definition of locally optimal A permutation p is a locally Kemeny optimal aggregation of partial lists t 1, t 2, . . . , tk, if there is no permutation p' that can be obtained from p by performing a single transposition of an adjacent pair of elements and for which K(p', t 1, t 2, . . . , tk) < K(p, t 1, t 2, . . . , tk). In other words, it is impossible to reduce the total distance to the t's by flipping an adjacent pair.

Example of LKO but not KO • Example 1 • t 1 = (1, 2), t 2 = (2, 3), t 3 = t 4 = t 5 = (3, 1). • p = (1, 2, 3), We have that p satisfies Definition of LKO, K(p, t 1, t 2, . . . , t 5)= 3, but transposing 1 and 3 decreases the sum to 2.

LKO satisfies XCC(2) • Proof by contradiction If the result is false then there exist partial lists t 1, t 2, . . . , tk, a LKO aggregation p, and a partition (C, D) that violates XCC(2) where some p(d) < p(c), but a majority prefer c over d. Let (c, d) be the closest such pair in p. • Consider the immediate successor of d in p, call it e. If e=c then c is adjacent to d in p and transposing this adjacent pair of alternatives produces a p' such that K(p', t 1, t 2, . . . , tk) < K(p, t 1, t 2, . . . , tk), contradicting the assumption on p. • If e does not equal c, then either e is in C, in which case the pair (e, d) is a closer pair in p than (d, c) and also violates the extended Condorcet condition, or e is in D, in which case (e, c) is a closer pair than (d, c) that violates the extended Condorcet condition. Both cases contradict the choice of (d, c).

Local Kemenization procedure • A local Kemenization of a full list with respect to preference lists so as to compute a locally Kemeny optimal aggregation that is maximally consistent with original. This approach: (1) preserves the strengths of the initial aggregation (2) ranks non-spam above spam. (3) gives a result that disagrees with original on any pair (i, j) only if a majority endorse this disagreement. (4) for every d, 1 ≤ d ≤ | μ |, the restriction of the output is a local Kemenization of the top d elements of μ

Local Kemenization procedure • A simple inductive construction. • Assume inductively for that we have constructed p, a local Kemenization of the projection of the t's onto the elements 1, . . . , l-1. • Insert next element x into the lowest-ranked "permissible" position in p: just below the lowest-ranked element y in p such that – (a) no majority among the (original) t's prefers x to y and – (b) for all successors z of y in p there is a majority that prefers x to z. • In other words, we try to insert x at the end (bottom) of the list p; we bubble it up toward the top of the list as long as a majority of the t's insists that we do.

How do we perform local kemenization? • Local Kemenization Example! A B F E C D A>B: 3 A<B: 2 B C A E F D A C F D E B B>D: 4 B<D: 1 B F D C A E C A B F E D B A D C E F A B D C F E D disagree

Experiments: Induced and Normalized Distance Measures (1) The Spearman footrule distance is sum of pointwise distances. It is normalized by dividing this number by the maximum value (1/2)|S|2, between 0 and 1. (2) The Kendall tau distance counts the number of pairwise disagreements. Dividing by the maximum possible value (1/2)S(S - 1) we obtain a normalized version. (3) The induced footrule distance is obtained by taking the projections of a full list s with each partial list. In a similar manner, induced Kendall tau distance can be defined. (4) The scaled footrule distance weights contributions of elements based on the length of the lists they are present in. If s is a full list and t is a partial list, then: SF(s, t) = Sum | s(i)/|s|) - (t(i)/|t|) |. We normalize SF by dividing by |t|/2.

Experiments: meta-search K = Kendall distance IF = induced footrule distance SF = scaled footrule distance LK = Local Kemenization