Foundations of Privacy Lecture 7 Lecturer Moni Naor

Recap of last week’s lecture • Counting Queries – Hardness Results – Tracing Traitors

Can We output Synthetic DB Efficiently? |C| |U| subpoly ? Signatures poly ? Not

General output sanitizers Theorem Traitor tracing schemes exist if and only if sanitizing is

Traitor Tracing: The Problem • Center transmits a message to a large group •

Traitor Tracing ! Hard Sanitizing A (private-key) traitor-tracing scheme consists of algorithms Setup, Encrypt,

Simple Example of Tracing Traitor • Let EK(m) be a good shared key encryption

Equivalence of TT and Hardness of Sanitizing Traitor Tracing Key Sanitizing hard for distribution

Traitor Tracing ! Hard Sanitizing Theorem If exists TT scheme – cipher length c(n),

Interactive Model query 1 Sanitizer query 2 Data Multiple queries, chosen adaptively ?

Counting Queries: answering Database D queries interactively of size n Query c Counting-queries C

Can we answer queries when not given in advance? • Can always answer with

Idea: Maintain List of Possible Databases • Start with D 0 = list of

• Initialize D 0 = {all databases of size m over U}. •

Need to show Accuracy and functionality: • The result is accurate • If Aj

The number of large rounds is bounded • If Aj is large, then must

Why is there a good xi Database x* of size n Query c Counting-queries

Size m is Õ(n 2/3 log k) For any c 1, c 2, …,

Why can we release when large rounds occur? • Do not expect more than

Privacy of Identity of Large Rounds Protect: time of large rounds For every pair

Summary of Algorithms Three algorithms • BLR • DNRRV • RR (with help from

What if the data is dynamic? • Want to handle situations where the data

Google Flu Trends “We've found that certain search terms are good indicators of flu

Data Streams Data is a stream of items Sanitizer sees each item and updates

Three new issues/concepts • Continual Observation – The adversary gets to examine the output

Randomized Response • Randomized Response Technique [Warner 1965] – Method for polling stigmatizing questions

Petting The Dynamic Privacy Zoo User-Level Continual Observation Pan Private Differentially Private Continual Observatio

Continual Output Observation Data is a stream of items Sanitizer sees each item, updates

Continual Observation • Alg - algorithm working on a stream of data – Mapping

The Counter Problem 0/1 input stream 01100100000011000000100101 Goal : a publicly observable counter, approximating

Counters w. Continual Output Observation Data is a stream of 0/1 Sanitizer sees each

Counters w. Continual Output Observation Continual output: each time period, output total 1’s Initial

Why So Inaccurate? • Operate essentially as in randomized response – No utilization of

Delayed Updates Main idea: update output value only when large gap between actual count

Delayed Output Counter Outt - current output At - count since last update. Dt

Privacy of Delayed Output At – Dt > fresh noise, Outt+1 D Out +A

Dynamic from Static. Accumulator • measured when Run many accumulators in parallel: stream is

The Segment Construction Based on the bit representation: Each point t is in dlog

Synthetic Counter Can make the counter synthetic • Monotone • Each round counter goes

Petting The Dynamic Privacy Zoo Continual Pan Privacy Differentially Private Outputs Privacy under Continual

Slides: 42

Download presentation

Foundations of Privacy Lecture 7 Lecturer: Moni Naor

Recap of last week’s lecture • Counting Queries – Hardness Results – Tracing Traitors and hardness results for general (non synthetic) databases

Can We output Synthetic DB Efficiently? |C| |U| subpoly ? Signatures poly ? Not in gener al General algorith m Hard on Using Avg. PRFs

General output sanitizers Theorem Traitor tracing schemes exist if and only if sanitizing is hard Tight connection between |U|, |C| hard to sanitize and key, ciphertext sizes in traitor tracing Separation between efficient/non-efficient

Traitor Tracing: The Problem • Center transmits a message to a large group • Some Users leak their keys to pirates • Pirates construct a clone: unauthorized decryption devices K 1 E(Content) K 3 K 8 Pirate Box Content • Traitors Given ``privacy” a Pirate Box want to find who leaked is violated!

Traitor Tracing ! Hard Sanitizing A (private-key) traitor-tracing scheme consists of algorithms Setup, Encrypt, Decrypt and Trace. Setup: generates a key bk for the broadcaster and N subscriber keys k 1, . . . , k. N. Need semantic security! Encrypt: given a bit b generates ciphertext using the broadcaster’s key bk. Decrypt: takes a given ciphertext and using any of the subscriber keys retrieves the original bit

Simple Example of Tracing Traitor • Let EK(m) be a good shared key encryption scheme • Key generation: generate independent keys for E bk = k 1, . . . , k. N • Encrypt: for bit b generate independent ciphertexts EK 1(b), EK 2(b), … EKN(b) • Decrypt: using ki: decrypt ith ciphertext • Tracing algorithm: using hybrid argument Properties: ciphertext length N, key length 1.

Equivalence of TT and Hardness of Sanitizing Traitor Tracing Key Sanitizing hard for distribution of DBs of) (collection Database entry Ciphertext (collection of) Query TT Pirate Sanitizer

Traitor Tracing ! Hard Sanitizing Theorem If exists TT scheme – cipher length c(n), – key length k(n), Violate its can construct: privacy! 1. Query set C of size ≈2 c(n) 2. Data universe U of size ≈2 k(n) 3. Distribution D on n-user databases w entries from U D is “hard to sanitize”: exists tracer that can extract an entry in D from any sanitizer’s output Separation between efficient/non-efficient sanitizers

Interactive Model query 1 Sanitizer query 2 Data Multiple queries, chosen adaptively ?

Counting Queries: answering Database D queries interactively of size n Query c Counting-queries C is a set of predicates c: U {0, 1} Query: how many D participants satisfy c ? Relaxed accuracy: U answer query within α additive error w. h. p Not so bad: error anyway inherent in statistical analysis Interactive

Can we answer queries when not given in advance? • Can always answer with independent noise – Limited to number of queries that is smaller than database size. • We do not know the future but we do know the past! – Can answer based on past answers

Idea: Maintain List of Possible Databases • Start with D 0 = list of all databases of size m – Gradually shrinks • Each round j: – if list Dj-1 is representative: answer according to average database in list – Otherwise: prune the list to maintain consistency Dj-1 Dj

• Initialize D 0 = {all databases of size m over U}. • Input: x* • Each round Dj-1 = {x 1, x 2, …} where xi of size m Low sensitivity! T ¼ n 1 - For each query c 1, c 2, …, ck in turn: • Let Aj Ã Averagei 2 Dj-1 min{dcj(x*, xi), T} • If Aj is small: answer according to median db Noisy threshold in Dj-1 – Dj Ã Dj-1 Plus noise • If Aj is large: Give true answer according to x* Need two threshold values: – remove all db’s that are far away to get Dj

Need to show Accuracy and functionality: • The result is accurate • If Aj is large: many of xi 2 Dj-1 are removed • Dj is never empty How much noise should we add? Privacy: • Not many large Aj # of large rounds / • Can release identity of large rounds • Can release noisy answers in large rounds.

The number of large rounds is bounded • If Aj is large, then must be many sets where the difference with real value is close to T – Assuming actual answer is close to true answer (with added noise): many sets are far from actual answer Constant fraction |U| Size of D 0 - ( – Therefore many sets are pruned m ) 0 T Total number of rounds: m Actual Threshold for small log|U| and large Threshold for far

Why is there a good xi Database x* of size n Query c Counting-queries C is a set of predicates c: U {0, 1} Query: how many D participants satisfy c ? Claim: Sample x of size m approximates x* on all given c 1, c 2, …, ck U The sample is assumed to be from x* since we start with all possible sets Existential proof – need not know c 1, c 2, …, ck in advance

Size m is Õ(n 2/3 log k) For any c 1, c 2, …, ck: There exist a set x of size m = Õ((nα)2·log k) s. t. maxj distcj (x, x*) ≤ α For α=Õ(n 2/3 log k), distcj(x, x*) = |1/m hcj, x i-1/nhcj, x*i|

Why can we release when large rounds occur? • Do not expect more than O(m log|U| ) large rounds • Make threshold noisy For every pair of neighboring databases: x* and x’* • Consider vector of threshold noises – Of length k • If a point is far away from threshold – same in both

Privacy of Identity of Large Rounds Protect: time of large rounds For every pair of neighboring databases: x* and x’* Can pair up noise thershold vectors 1 2 k-1 k k+1 1 2 k-1 ’k k+1 For only a few O(m log|U|) points: x* is above threshold and x’*ε below. Can Prob ≈ e correct threshold value ’k = k +1

Summary of Algorithms Three algorithms • BLR • DNRRV • RR (with help from HR)

What if the data is dynamic? • Want to handle situations where the data keeps changing – Not all data is available at the time of sanitization Curator/ Sanitizer

Google Flu Trends “We've found that certain search terms are good indicators of flu activity. Google Flu Trends uses aggregated Google search data to estimate current flu activity around the world in near realtime. ”

Example of Utility: Google Flu Trends

What if the data is dynamic? • Want to handle situations where the data keeps changing – Not all data is available at the time of sanitization Issues • When does the algorithm make an output? • What does the adversary get to examine? • How do we define an individual which we should protect? D+Me

Data Streams Data is a stream of items Sanitizer sees each item and updates internal state. Produces output: either on-the-fly or at the end output state Sanitizer Data Stream

Three new issues/concepts • Continual Observation – The adversary gets to examine the output of the sanitizer all the time • Pan Privacy – The adversary gets to examine the internal state of the sanitizer. Once? Several times? All the time? • “User” vs. “Event” Level Protection

Randomized Response • Randomized Response Technique [Warner 1965] – Method for polling stigmatizing questions “trust no-one” – Idea: Lie with known probability. • Specific answers are deniable • Aggregate results are still valid • The “in the plain” Popular in 1 DB literature 1 data is 0 never stored + noise Mishra and + Sandler. … noise

Petting The Dynamic Privacy Zoo User-Level Continual Observation Pan Private Differentially Private Continual Observatio n Pan Private Randomized Response User level

Continual Output Observation Data is a stream of items Sanitizer sees each item, updates internal state. Produces an output observable to the adversary Output state Sanitizer

Continual Observation • Alg - algorithm working on a stream of data – Mapping prefixes of data streams to outputs Adjacent streams: can get one to the other by changing – Step i output from i one element • Alg is ε-differentially. S= private against acgtbxcde S’=ifacgtbycde continual observation for all – adjacent data streams S and S’ – for. Pr[Alg(S)= all prefixes 1 t outputs … t 2 … t ] 1 2 ≤ eε ≈ 1+ε e-ε ≤ Pr[Alg(S’)= 1 2 … t]

The Counter Problem 0/1 input stream 01100100000011000000100101 Goal : a publicly observable counter, approximating the total number of 1’s so far Continual output: each time period, output total number of 1’s Want to hide individual increments while providing reasonable accuracy

Counters w. Continual Output Observation Data is a stream of 0/1 Sanitizer sees each xi, updates internal state. Produces a value observable to the adversary 1 1 1 2 Output state 1 Sanitizer 0 0 1 1 0 0 0 1

Counters w. Continual Output Observation Continual output: each time period, output total 1’s Initial idea: at each time period, on input xi 2 {0, 1} Update counter by input xi Add independent Laplace noise with magnitude 1/ε -4 -3 -2 -1 0 1 2 3 4 5 Privacy: since each increment protected by Laplace total noise – differentially private whether xi. Tis– 0 or 1 number of Accuracy: noise cancels out, error Õ(√T) time periods For sparse streams: this error too high.

Why So Inaccurate? • Operate essentially as in randomized response – No utilization of the state • Problem: we do the same operations when the stream is sparse as when it is dense – Want to act differently when the stream is dense • The times where the counter is updated are potential leakage

Delayed Updates Main idea: update output value only when large gap between actual count and output Have a good way of outputting value of counter once: the actual counter + noise. Maintain Actual count At (+ noise ) Current output outt (+ noise) D – update threshold

Delayed Output Counter Outt - current output At - count since last update. Dt - noisy threshold If At – Dt > fresh noise then Outt+1 Outt + At + fresh noise At+1 0 Dt+1 D + fresh noise Noise: independent Laplace noise with magnitude 1/ε delay Accuracy: • For threshold D: w. h. p update about N/D times • Total error: (N/D)1/2 noise + D + noise

Privacy of Delayed Output At – Dt > fresh noise, Outt+1 D Out +A +t+freshnoise t+1 t. D Protect: update time and update value For any two adjacent sequences 101101110001 Where first 101101010001 update after difference Can pair up noise vectors occurred 1 2 k-1 k k+1 Dt D’t 1 2 k-1 ’k k+1 Identical in all locations except one ε Prob ≈ e ’k = k +1

Dynamic from Static. Accumulator • measured when Run many accumulators in parallel: stream is in the – each accumulator: counts number oftime 1'sframe in a fixed segment of time plusofnoise. Idea: apply conversion static algorithms into dynamic ones 1980 –Bentley-Saxe Value of the output counter at any point in time: sum of the accumulators of few Only finished segments used • Accuracy: depends on number of segments in summation and the accuracy of accumulators x t

The Segment Construction Based on the bit representation: Each point t is in dlog te segments t i=1 xi - Sum of at most log t accumulators By setting ’ ¼ / log T can get the desired privacy Accuracy: With all but negligible in T probability the canceling error at every step t is at most O((log 1. 5 T)/ )).

Synthetic Counter Can make the counter synthetic • Monotone • Each round counter goes up by at most 1 Apply to any monotone function

Petting The Dynamic Privacy Zoo Continual Pan Privacy Differentially Private Outputs Privacy under Continual Observation Pan Privacy Sketch vs. Stream User level