Math Stat Department Colloquium Georgetown University March 20

  • Slides: 124
Download presentation
Math & Stat Department Colloquium Georgetown University March 20, 2020

Math & Stat Department Colloquium Georgetown University March 20, 2020

Security and Privacy for Distributed Optimization and Learning Nitin Vaidya Georgetown University

Security and Privacy for Distributed Optimization and Learning Nitin Vaidya Georgetown University

Goals g Background g Problem formulation g Intuition No theorems/proofs

Goals g Background g Problem formulation g Intuition No theorems/proofs

Rendezvous 4

Rendezvous 4

Rendezvous g g X g 5

Rendezvous g g X g 5

g

g

Averaging g 7

Averaging g 7

Averaging g

Averaging g

Machine Learning g Data is distributed across different agents Agent 1 Agent 2 Agent

Machine Learning g Data is distributed across different agents Agent 1 Agent 2 Agent 3 Agent 4

g Data is distributed across different Collaborate to learn agents

g Data is distributed across different Collaborate to learn agents

Machine Learning g g

Machine Learning g g

Classification G. Renee Guzlas

Classification G. Renee Guzlas

g

g

Gradient Method g x[0] x[1] x[2] x[3]

Gradient Method g x[0] x[1] x[2] x[3]

Gradient Method g x[0] x[1] x[2] x[3] g

Gradient Method g x[0] x[1] x[2] x[3] g

Gradient Method g x[0] x[1] x[2] x[3] g

Gradient Method g x[0] x[1] x[2] x[3] g

Distributed Optimization g 17

Distributed Optimization g 17

1984 18

1984 18

Architectures g g g Server 1 g g

Architectures g g g Server 1 g g

Distributed Optimization Iterative algorithm g Each agent maintains local estimate of optimum x g

Distributed Optimization Iterative algorithm g Each agent maintains local estimate of optimum x g Local estimates shared with neighbors in each iteration g Local estimates converge to optimum 20

x 1 x 2 x 3 x 3 Example based on [Nedic and Ozdaglar,

x 1 x 2 x 3 x 3 Example based on [Nedic and Ozdaglar, 2009]

x 1 x 2 x 3 x 3 g

x 1 x 2 x 3 x 3 g

x 1 x 2 x 3 x 3 g g

x 1 x 2 x 3 x 3 g g

Distributed Optimization In the limit as t ∞ g Consensus: All agents converge to

Distributed Optimization In the limit as t ∞ g Consensus: All agents converge to same estimate g Optimality 24

Why does this work?

Why does this work?

Architectures g g g Server 1 g g

Architectures g g g Server 1 g g

Parameter Server g g Server

Parameter Server g g Server

Parameter Server g g

Parameter Server g g

Parameter Server g g

Parameter Server g g

Parameter Server g g

Parameter Server g g

Many Variations Server … stochastic optimization … asynchronous … gradient compression … acceleration …

Many Variations Server … stochastic optimization … asynchronous … gradient compression … acceleration … shared memory

Architectures g g g Server 1 g g

Architectures g g g Server 1 g g

Challenges

Challenges

Challenges g Fault-tolerant distributed optimization f 1(x) + f 2(x) + f 3(x) How

Challenges g Fault-tolerant distributed optimization f 1(x) + f 2(x) + f 3(x) How to optimize if agents inject bogus information? Server g g

Challenges g Privacy-preserving distributed optimization g How to collaborate without revealing own cost function?

Challenges g Privacy-preserving distributed optimization g How to collaborate without revealing own cost function? g

Fault-Tolerant Optimization 2015 …

Fault-Tolerant Optimization 2015 …

Byzantine Fault Model g No constraint on misbehavior of faulty agents

Byzantine Fault Model g No constraint on misbehavior of faulty agents

Rendezvous g g X g 38

Rendezvous g g X g 38

Rendezvous g g X g 39

Rendezvous g g X g 39

Machine Learning Faulty agent can adversely affect model parameters g g g

Machine Learning Faulty agent can adversely affect model parameters g g g

Fault-Tolerance g What should be the objective of fault-tolerant optimization? 41

Fault-Tolerance g What should be the objective of fault-tolerant optimization? 41

Fault-Tolerance g What should be the objective of fault-tolerant optimization? g Optimize over only

Fault-Tolerance g What should be the objective of fault-tolerant optimization? g Optimize over only good agents … set G

Fault-Tolerance g What should be the objective of fault-tolerant optimization? g Optimize over only

Fault-Tolerance g What should be the objective of fault-tolerant optimization? g Optimize over only good agents … set G g

Parameter Server g g

Parameter Server g g

Parameter Server g g g

Parameter Server g g g

g ? Is a this c ble a v hie

g ? Is a this c ble a v hie

g ? Is It Depends a this c ble a v hie

g ? Is It Depends a this c ble a v hie

g ? Is a this ble a v hie c It Depends Independent functions

g ? Is a this ble a v hie c It Depends Independent functions “Enough” redundancy

Independent Functions a b c

Independent Functions a b c

Independent Functions a b c

Independent Functions a b c

Independent Functions a b c

Independent Functions a b c

Independent Functions a b c

Independent Functions a b c

Independent Functions Provably impossible to compute g

Independent Functions Provably impossible to compute g

g Independent functions Approximate lele? ? b b a a vv e e i

g Independent functions Approximate lele? ? b b a a vv e e i i h h aacc s s i i th Is. Isth “Enough” redundancy Exact

g Independent functions Approximate lele? ? b b a a vv e e i

g Independent functions Approximate lele? ? b b a a vv e e i i h h aacc s s i i th Is. Isth “Enough” redundancy Exact

An Example of Redundancy g g

An Example of Redundancy g g

n=6 t=1 (number of agents) (faulty agents) G. Renee Guzlas

n=6 t=1 (number of agents) (faulty agents) G. Renee Guzlas

n=6 t=1 (number of agents) (faulty agents) G. Renee Guzlas

n=6 t=1 (number of agents) (faulty agents) G. Renee Guzlas

Parameter Server g g g

Parameter Server g g g

Norm Filter g g 60

Norm Filter g g 60

Norm Filter g g g g

Norm Filter g g g g

Norm Filter g g g g Exact optimum computed despite faulty agents

Norm Filter g g g g Exact optimum computed despite faulty agents

Another Example of Redundancy 63

Another Example of Redundancy 63

Another Example of Redundancy g Machine learning Agent 1 g Agents draw samples from

Another Example of Redundancy g Machine learning Agent 1 g Agents draw samples from identical data distribution g Filter on stochastic gradients Agent 2

g Independent functions Approximate lele? ? b b a a vv e e i

g Independent functions Approximate lele? ? b b a a vv e e i i h h aacc s s i i th Is. Isth “Enough” redundancy Exact

How to Approximate g ? 66

How to Approximate g ? 66

How to Approximate g ? g 67

How to Approximate g ? g 67

How to Approximate g g g Ideal goal: Equal weight for all non-faulty agents

How to Approximate g g g Ideal goal: Equal weight for all non-faulty agents ?

How to Approximate g g g Ideal goal: Equal weight for all non-faulty agents

How to Approximate g g g Ideal goal: Equal weight for all non-faulty agents g Approximation: Unequal weights ?

Results g For each faulty agent, a good agent may be ignored Weight 0

Results g For each faulty agent, a good agent may be ignored Weight 0

Results g g

Results g g

Results g g For each faulty agent, a good agent may be ignored Weight

Results g g For each faulty agent, a good agent may be ignored Weight 0 But remaining good agent can get almost uniform importance 72

Results g Cannot compute g

Results g Cannot compute g

Results g g

Results g g

Parameter Server g g g

Parameter Server g g g

Privacy-Preserving Optimization 2016 …

Privacy-Preserving Optimization 2016 …

Communication Leaks Information g Server g g

Communication Leaks Information g Server g g

Communication Leaks Information g Server g g Server can use gradients to infer polynomial

Communication Leaks Information g Server g g Server can use gradients to infer polynomial cost functions (up to a constant)

Related Work Cryptographic Methods Transformation Methods g Differential Privacy Query g Perturbed Output Database

Related Work Cryptographic Methods Transformation Methods g Differential Privacy Query g Perturbed Output Database + Noise

Our Approach g Motivated by secret sharing & differential privacy Add cancellable noise 80

Our Approach g Motivated by secret sharing & differential privacy Add cancellable noise 80

Multiple Parameter Servers consensus step Server 1 Server 2 g 81

Multiple Parameter Servers consensus step Server 1 Server 2 g 81

Improving Privacy Server 1 Server 2 g g ε 1 + ε 2 =

Improving Privacy Server 1 Server 2 g g ε 1 + ε 2 = 0 over time 82

Convex Sum of Non-Convex Functions Server 2 Server 1 g g g

Convex Sum of Non-Convex Functions Server 2 Server 1 g g g

Convex Sum of Non-Convex Functions Server 2 Server 1 g g g “Privacy” if

Convex Sum of Non-Convex Functions Server 2 Server 1 g g g “Privacy” if at least one server is non-adversarial

Privacy Fault-tolerance Gradient filters Cancellable noise Server g g g 1 g g

Privacy Fault-tolerance Gradient filters Cancellable noise Server g g g 1 g g

Decentralized Control/Optimization Distributed Computing Picture from Wikipedia 86

Decentralized Control/Optimization Distributed Computing Picture from Wikipedia 86

Lili Su Shripad Gade Dimitrios Pylorof Nirupam Gupta Shuo Liu

Lili Su Shripad Gade Dimitrios Pylorof Nirupam Gupta Shuo Liu

Thanks! disc. georgetown. domains

Thanks! disc. georgetown. domains

Net-X: Multi. Channel Mesh capacity D E Fixed F B A Switchable C channels

Net-X: Multi. Channel Mesh capacity D E Fixed F B A Switchable C channels Theory to Practice Net-X testbed Capacity bounds Insights on protocol design OS improvements Software architecture User Applications Multi-channel protocol IP Stack ARP Channel Abstraction Module Linux box CSL Interface Device Driver 89

Hajnal 1958 Weak ergodicity of nonhomogeneous Markov chains Distributed Computing De. Groot 1974 Reaching

Hajnal 1958 Weak ergodicity of nonhomogeneous Markov chains Distributed Computing De. Groot 1974 Reaching a consensus 1980: Pease, Shostak, Lamport Byzantine consensus 1983: Fischer, Lynch, Paterson Decentralized Control Tsitsiklis 1984 Asynchronous consensus impossibility result Jadbabaei, Lin, Morse 2003 Flocking problem Nedich, Ozdaglar 2009 1986: Dolev et al. Approximate Byzantine consensus

Distributed Optimization g g Each agent maintains an estimate g Local estimates shared with

Distributed Optimization g g Each agent maintains an estimate g Local estimates shared with neighbors & updated g Estimates converge to optimum 91

x 1 x 2 x 3 x 3 Example based on [Nedic and Ozdaglar,

x 1 x 2 x 3 x 3 Example based on [Nedic and Ozdaglar, 2009]

x 1 x 2 x 3 x 3 g

x 1 x 2 x 3 x 3 g

x 1 x 2 x 3 x 3 g g

x 1 x 2 x 3 x 3 g g

Distributed Optimization As time ∞ g Consensus: All agents converge to same estimate g

Distributed Optimization As time ∞ g Consensus: All agents converge to same estimate g Optimality g 95

t faulty agents

t faulty agents

t faulty agents g g Discard smallest t and largest t gradients Average the

t faulty agents g g Discard smallest t and largest t gradients Average the rest

t faulty agents g Discard smallest t and largest t gradients Average the rest

t faulty agents g Discard smallest t and largest t gradients Average the rest g t=1 g g

t faulty agents g Discard smallest t and largest t gradients Average the rest

t faulty agents g Discard smallest t and largest t gradients Average the rest g t=1 g g What does this achieve?

Our Ideal Goal g g 100

Our Ideal Goal g g 100

Independent Functions How good an approximation? g Instead of uniform weights g the filter

Independent Functions How good an approximation? g Instead of uniform weights g the filter achieves unequal weights g

Independent Functions g Cost functions of faulty nodes “filtered away”

Independent Functions g Cost functions of faulty nodes “filtered away”

Independent Functions g Cost functions of faulty nodes “filtered away” At most t good

Independent Functions g Cost functions of faulty nodes “filtered away” At most t good cost functions also filtered away

Independent Functions g Cost functions of faulty nodes “filtered away” At most t good

Independent Functions g Cost functions of faulty nodes “filtered away” At most t good cost functions also filtered away Nearly uniform importance to the remaining costs

Independent Functions How good an approximation? g g 0 0 ¼ ¼ 0 0

Independent Functions How good an approximation? g g 0 0 ¼ ¼ 0 0

Independent Functions How good an approximation? g g 0 0 ¼ ¼ 0 0

Independent Functions How good an approximation? g g 0 0 ¼ ¼ 0 0 g ⅛ ⅛ ¼ ¼ 0 0

Independent Functions g

Independent Functions g

Good News Cost functions often naturally redundant

Good News Cost functions often naturally redundant

Good News Cost functions often naturally redundant g Data sets at different agents may

Good News Cost functions often naturally redundant g Data sets at different agents may be drawn from same distribution In expectation, all cost functions are identical

Good News Cost functions often naturally redundant g Data sets at different agents may

Good News Cost functions often naturally redundant g Data sets at different agents may be drawn from same distribution In expectation, all cost functions are identical g Observations by different agents conditioned on the same ground truth

Good News Cost functions often naturally redundant g g 111

Good News Cost functions often naturally redundant g g 111

Linear Regression

Linear Regression

Linear Regression g

Linear Regression g

Linear Regression g g

Linear Regression g g

Linear Regression g g =

Linear Regression g g =

Distributed Linear Regression g 116

Distributed Linear Regression g 116

Distributed Linear Regression g g g =

Distributed Linear Regression g g g =

Distributed Linear Regression g g g =

Distributed Linear Regression g g g =

Server g g g 119

Server g g g 119

2 t-Redundancy … Linear Regression g 120

2 t-Redundancy … Linear Regression g 120

2 t-Redundancy … Linear Regression g

2 t-Redundancy … Linear Regression g

Privacy Techniques Differential privacy … add noise ε g Server g g

Privacy Techniques Differential privacy … add noise ε g Server g g

Privacy Techniques Differential privacy … add noise ε g Server g g Optimality compromised

Privacy Techniques Differential privacy … add noise ε g Server g g Optimality compromised due to the noise g g

Privacy Techniques Homomorphic encryption g Expensive 124

Privacy Techniques Homomorphic encryption g Expensive 124