Global Privacy Guarantee in Serial Data Publishing Raymond
Global Privacy Guarantee in Serial Data Publishing Raymond Chi-Wing Wong 1, Ada Wai-Chee Fu 2, Jia Liu 2, Ke Wang 3, Yabo Xu 4 The Hong Kong University of Science and Technology 1 The Chinese University of Hong Kong 2 Simon Fraser University 3 Sun Yat-sen University 4 Prepared by Raymond Chi-Wing Wong Presented by Raymond Chi-Wing Wong 1
Outline 1. Sequential Releases 2. Related Work 3. Our Proposed Privacy Model n Local Guarantee 4. Conclusion 2
1. Sequential Releases Time = 1 Public Published Data Name PID Raymond p 1 Disease Flu Peter p 2 HIV Mary p 3 Fever Alice p 4 HIV Bob p 5 Flu John p 6 Fever This table satisfies some privacy requirements (e. g. , m-invariance) Release the data set to public Hospital Medical Data Name PID Raymond p 1 Disease Flu Peter p 2 HIV Mary p 3 Fever Alice p 4 HIV Bob p 5 Flu John p 6 Fever 3
This table satisfies some privacy requirements (e. g. , m-invariance) 1. Sequential Releases Time = 1 Time = 2 Public Published Data Name PID Disease Name PID Raymond p 1 Flu Peter p 2 HIV Mary p 3 Fever Alice p 4 HIV Bob p 5 Flu John p 6 Fever Hospital Disease Release the data set to public Hospital Medical Data Name PID Disease Name PID Raymond p 1 Flu Raymond p 1 Disease Flu Peter p 2 HIV Mary p 3 Fever Alice p 4 HIV Bob p 5 Flu John p 6 Fever Insertions, deletions and updates 4
This table satisfies some privacy requirements (e. g. , m-invariance) 1. Sequential Releases Time = 1 Time = 2 Public Time = 3 Public Published Data Name PID Disease Name PID Raymond p 1 Flu Peter p 2 HIV Mary p 3 Fever Alice p 4 HIV Bob p 5 Flu John p 6 Fever Hospital Medical Data Disease Hospital Medical Data Name PID Disease Name PID Raymond p 1 Flu Peter p 2 HIV Mary p 3 Fever Alice p 4 HIV Bob p 5 Flu John p 6 Fever Insertions, deletions and updates Disease 5
Problem: At the current time t, we want to generate a table which satisfies some privacy requirements (e. g. , m-invariance) with respect to all published tables at any time <= t 1. Sequential Releases Time = 1 Time = 2 Public Time = 3 Public Published Data Name PID Disease Name PID Raymond p 1 Flu Peter p 2 HIV Mary p 3 Fever Alice p 4 HIV Bob p 5 Flu John p 6 Fever Hospital Medical Data Disease Hospital Medical Data Name PID Disease Name PID Raymond p 1 Flu Raymond p 1 Disease Flu Peter p 2 HIV Mary p 3 Fever Alice p 4 HIV Bob p 5 Flu John p 6 Fever 6
Problem: At the current time t, we want to generate a table Privacy Requirement: which satisfies (e. g. , Peter would notsome wantprivacy anyonerequirements to deduce with highm-invariance) confidence from these with respect to that all published tables at any time <= t in the past. published data he has ever contracted chlamydia 1. Sequentialone. Releases or more published dataset Time = 1 Time = 2 Public Published Data Disease Name PID Raymond p 1 Flu Peter p 2 HIV Mary p 3 Fever Alice p 4 HIV Bob p 5 Flu John p 6 Fever Hospital Sex Medical Data Raymond Name M Peter Bob John Disease 65001 flu Disease p 1 Flu p 2 HIV p 3 Fever p 4 HIV p 5 Flu M Mary Alice Zipcode PID Raymond Alice Published Data PID Name Mary Public Name Hospital. Data Medical Peter Time = 3 F F p 6 65002 flu 65015 fever Fever Hospital A sexually transmitted disease (STD) Medical Data chlamydia 65014 Disease Name PID Raymond p 1 Flu Raymond p 1 Disease Flu Peter p 2 HIV Mary p 3 Fever Alice p 4 HIV Bob p 5 Flu John p 6 Fever 7
Global Guarantee Problem: At the current time t, we want to generate a table Privacy Requirement: which satisfies (e. g. , m-invariance) Probability that Peter isanyone linkedrequirements to deduce chlamydia inhigh one or more published dataset is Peter would notsome wantprivacy to with confidence from these with respect to that all published tables at any time <= t in the past. at most a given threshold 1/2). published data he has(e. g. , ever contracted chlamydia 1. Sequential Releases Time = 1 Time = 2 Public Published Data Disease Name PID Raymond p 1 Flu Peter p 2 HIV Mary p 3 Fever Alice p 4 HIV Bob p 5 Flu John p 6 Fever Hospital Sex Medical Data Raymond Name M Peter Bob John Disease 65001 flu Disease p 1 Flu p 2 HIV p 3 Fever p 4 HIV p 5 Flu M Mary Alice Zipcode PID Raymond Alice Published Data PID Name Mary Public Name Hospital. Data Medical Peter Time = 3 F F p 6 65002 flu 65015 fever Fever Hospital A sexually transmitted disease (STD) Medical Data chlamydia 65014 Disease Name PID Raymond p 1 Flu Raymond p 1 Disease Flu Peter p 2 HIV Mary p 3 Fever Alice p 4 HIV Bob p 5 Flu John p 6 Fever 8
Global Guarantee Problem: At the current time t, we want to generate a table Privacy Requirement: which satisfies (e. g. , m-invariance) Probability that Peter isanyone linkedrequirements to deduce chlamydia inhigh one or more published dataset is Peter would notsome wantprivacy to with confidence from these with respect tothat all published tables at any time <= t in the past. at most adata given threshold (e. g. , 1/2). released he has ever contracted chlamydia 1. Sequential Releases n n n This global guarantee requirement seems to be quite “obvious” and “natural” No existing works consider this global guarantee requirement Instead, they consider another requirement called local guarantee. 9
Privacy Requirement: Local Guarantee Probability that Peter is linked to chlamydia in each published dataset is at most a given threshold (e. g. , 1/2). 1. Sequential Releases Time = 1 Time = 2 Public Time = 3 Public Published Data Name PID Disease Name PID Raymond p 1 Flu Raymond p 1 Disease Flu Peter p 2 HIV Mary p 3 Fever Alice p 4 HIV Bob p 5 Flu John p 6 Fever Probability that Peter is linked to Hospital chlamydia in the dataset at time = 2 is at Name Sex Zipcode Disease A(e. g. , sexually disease (STD) 1/2). transmitted Medical Data most a given threshold Medical Data Hospital. Data Medical Raymond Name M Peter 65001 PID Disease p 1 Flu p 2 HIV Mary p 3 Alice flu Name PID Disease Name PID Raymond p 1 Flu Peter p 2 HIV Fever Mary p 3 p 4 HIV Alice p 4 Bob p 5 Flu Bob p 5 John p 6 Fever John p 6 Raymond Peter M 65002 chlamydia Probability that is linked flu to Mary F Peter 65014 chlamydia in Fthe dataset Alice 65015 at time fever= 1 is at most a given threshold (e. g. , 1/2). Disease Probability that Peter is linked to chlamydia in the dataset at time = 3 is at most a given threshold (e. g. , 1/2). Fever Mary p 3 Fever HIV Alice p 4 HIV Flu Bob p 5 Flu Fever John p 6 Fever 10
2. Related Work n Local Guarantee n m-invariance n n Xiao et al, “m-invariance: Towards Privacy Preserving Republication of Dynamic Datasets”, SIGMOD, 2007 l-scarcity n Bu et al, “Privacy Preserving Serial Data Publishing by Role Composition”, VLDB, 2008 11
Contribution n n We are the first to propose the global guarantee requirement We prove that global guarantee is a stronger requirement than local guarantee 12
Global Guarantee Problem: At the current time t, we want to generate a table Privacy Requirement: which satisfies (e. g. , m-invariance) Probability that Peter isanyone linkedrequirements to deduce chlamydia inhigh one or more published dataset is Peter would notsome wantprivacy to with confidence from these with respect tothat all published tables at any time <= t in the past. at most adata given threshold (e. g. , 1/2). released he has ever contracted chlamydia n How can we calculate the probability? n According to the published datasets, n n we derive a formula based on the possible world analysis We skip the details. 13
Time = 1 Time = 2 Public Time = 3 Public Published Data Name PID Disease Name PID Raymond p 1 Flu Peter p 2 HIV Mary p 3 Fever Alice p 4 HIV Bob p 5 Flu John p 6 Fever Hospital Medical Data Disease Hospital Medical Data Name PID Disease Name PID Raymond p 1 Flu Raymond p 1 Disease Flu Peter p 2 HIV Mary p 3 Fever Alice p 4 HIV Bob p 5 Flu John p 6 Fever 14
Property n Theorem: Global guarantee is a stronger privacy requirement than local guarantee. If the published tables satisfy global guarantee, then they satisfy local guarantee. 15
Our Algorithm n n How can we generate tables such that they satisfy global guarantee? Idea: Large group size 16
5. Conclusion n n We are the first to propose global guarantee Global guarantee is a stronger privacy requirement than local guarantee. 17
Q&A 18
n In the following, I will elaborate two concepts. n n Local Guarantee (e. g. , m-invariance) Global Guarantee 19
Time = 1 Public Voter Registration List Published Data Sex Zipcode Disease Name Sex Zipcode M 65001 flu Raymond M 65001 M 65002 chlamydia Peter M 65002 F 65014 flu Mary F 65014 F 65015 fever Alice F 65015 Emily F 65010 Release the data set to public Hospital Medical Data Name Sex Zipcode Disease Raymond M 65001 flu Peter M 65002 chlamydia Mary F 65014 flu Alice F 65015 fever 20
Time = 1 Public Voter Registration List Published Data Sex Zipcode Disease Name Sex Zipcode M 65001 flu Raymond M 65001 M 65002 chlamydia Peter M 65002 F 65014 flu Mary F 65014 F 65015 fever Alice F 65015 Emily F 65010 Release the data set to public Hospital Medical Data Name Sex Zipcode Disease Raymond M 65001 flu Peter M 65002 chlamydia Mary F 65014 flu Alice F 65015 fever 21
Time = 1 Public Generalization Voter Registration List Published Data Sex Zipcode Disease Name Sex Zipcode M 6500* flu Raymond M 65001 M 6500* chlamydia Peter M 65002 F 6501* flu Mary F 65014 F Hospital Alice to “chlamydia” F 65015 with individual is linked probability at most 1/2 Emilyin THIS F PUBLISHED 65010 TABLE Release the data set to public 2 -diversity only focuses on ONE-TIME publishing 6501* Medical Data fever. Each 2 -invariance focuses on MULTIPLE-TIME publishing It. Zipcode also makes use of the idea of 2 -diversity Disease Name Sex Raymond M 65001 Idea: Peter M 65002 chlamydia Mary F Alice F 65015 fever flu Each individual is linked to “chlamydia” with probability at 65014 flu most 1/2 for each of the MULTIPLE PUBLISHED TABLES 22
2 -invariance Time = 1 Public Voter Registration List Published Data Disease Name Sex Zipcode MRaymond 6500* flu Raymond M 65001 M Peter 6500* chlamydia Peter M 65002 Sex Zipcode F 6501* Mary flu Mary F 65014 F 6501* Alice fever Alice F 65015 Emily F 65010 Release the data set to public Hospital Medical Data Name Sex Zipcode Disease Raymond Name M 65001 Signature flu Peter Raymond {flu, chlamydia} M 65002 chlamydia Mary F 65014 flu Peter {flu, chlamydia} Alice F 65015{flu, fever} fever Mary Alice {flu, fever} 23
2 -invariance Time = 1 Public Voter Registration List Published Data Sex Zipcode Disease Name Sex Zipcode M 6500* flu Raymond M 65001 M 6500* chlamydia Peter M 65002 F 6501* flu Mary F 65014 F 6501* fever Alice F 65015 Emily F 65010 Release the data set to public Hospital Medical Data Name Sex Zipcode Disease Raymond Name M 65001 Signature flu Peter Raymond {flu, chlamydia} M 65002 chlamydia Mary F 65014 flu Peter {flu, chlamydia} Alice F 65015{flu, fever} fever Mary Alice {flu, fever} 24
2 -invariance Time = 1 Public Voter Registration List Published Data Sex Zipcode Disease M 6500* flu M 6500* chlamydia F 6501* flu F 6501* fever Name Sex Zipcode Raymond M 65001 Peter M 65002 Mary F 65014 Alice F 65015 Emily F 65010 Release the data set to public Hospital Medical Data Name Sex Zipcode Disease Raymond Name M 65001 Signature flu Peter Raymond {flu, chlamydia} M 65002 chlamydia Mary F 65014 flu Peter {flu, chlamydia} Alice F 65015{flu, fever} fever Mary Alice {flu, fever} 25
Voter Registration List 2 -invariance Time = 1 Public Published Data Sex Zipcode Disease M 6500* flu M 6500* chlamydia F 6501* flu F 6501* fever Name Sex Zipcode Raymond M 65001 Peter M 65002 Mary F 65014 Alice F 65015 Emily F 65010 Release the data set to public Hospital Medical Data Name Sex Zipcode Disease Raymond Name M 65001 Signature flu Peter Raymond {flu, chlamydia} M 65002 chlamydia Mary F 65014 flu Peter {flu, chlamydia} Alice F 65015{flu, fever} fever Mary Alice {flu, fever} 26
2 -invariance Time = 1 Voter Registration List Time = 2 Public Sex Zipcode Disease M 6500* flu M 6500* chlamydia M 6500* Peter flu F 6501* Mary fever F 6501* Emily flu MRaymond 6500* Release the data set to public Hospital Medical Data Sex Zipcode Raymond M 65001 Peter M 65002 Mary F 65014 Alice F 65015 Emily F 65010 Disease Sex Name Sex Published Data Hospital Name chlamydia Release the data set to public Medical Data Zipcode Disease Name Sex Zipcode Disease Raymond Name M 65001 Signature flu Raymond M 65001 chlamydia Peter Raymond {flu, chlamydia} M 65002 chlamydia Peter M 65002 flu Mary F 65014 flu Peter {flu, chlamydia} Mary F 65014 fever Alice F 65015{flu, fever} fever Mary Emily F 65010 flu Alice {flu, fever} 27
Time = 1 2 -invariance Voter Registration List Time = 2 Public This table satisfies 2 -invariance. This is because each individual is Published linked to the. Data SAME signature. Sex Zipcode Disease Idea of 2 -invariance: M 6500* Each individual is linked chlamydia to the SAME M 6500* signature Fin each publishedflutable. 6501* F 6501* fever Sex Zipcode Medical Data Name Sex Zipcode Raymond M 65001 Peter M 65002 Mary F 65014 Alice F 65015 Emily F 65010 Disease chlamydia M 6500* Peter flu F 6501* Mary fever F 6501* Emily flu Release the data set to public Hospital Sex Published Data MRaymond 6500* flu Name Release the data set to public Medical Data Zipcode Disease Name Sex Zipcode Disease Name M 65001 Signature chlamydia Raymond Name M 65001 Signature flu Raymond Peter Raymond {flu, chlamydia} M 65002 chlamydia Peter Raymond {flu, chlamydia} M 65002 flu Mary F 65014 flu Peter {flu, chlamydia} Mary Peter {flu, chlamydia} F 65014 fever Alice F 65015{flu, fever} fever Mary Emily Mary F 65010{flu, fever} flu Alice {flu, fever} Emily {flu, fever} 28
2 -invariance Time = 1 Voter Registration List Time = 2 Public Name Sex Zipcode Raymond M 65001 Peter M 65002 Mary F 65014 Alice F 65015 Emily F 65010 Published Data Sex Zipcode Disease M 6500* flu M 6500* chlamydia M 6500* flu F 6501* fever F 6501* flu Release the data set to public Hospital Medical Data Name Sex Release the data set to public Medical Data Zipcode Disease Name Sex Zipcode Disease Name M 65001 Signature Chlamydia Raymond Name M 65001 Signature flu Raymond Peter Raymond {flu, chlamydia} M 65002 chlamydia Peter Raymond {flu, chlamydia} M 65002 flu Mary F 65014 flu Peter {flu, chlamydia} Mary Peter {flu, chlamydia} F 65014 fever Alice F 65015{flu, fever} fever Mary Emily Mary F 65010{flu, fever} flu Alice {flu, fever} Emily {flu, fever} 29
Time = 1 Public 2 -invariance Voter Registration List Time = 2 Name Sex Zipcode Raymond M 65001 M 65002 F 65014 F 65015 F 65010 2 -invariance provides the local guarantee. Peter Mary Probability that an individual is linked to chlamydia in each Alice of the published datasets is at most 1/2. Why? Emily Published Data Sex Zipcode Disease M 6500* flu M 6500* chlamydia M 6500* flu F 6501* fever F 6501* flu Possible World Analysis 30
Time = 1 Public 2 -invariance Voter Registration List Time = 2 Name Sex Zipcode Raymond M 65001 M 65002 F 65014 F 65015 F 65010 2 -invariance provides the local guarantee. Peter Mary Probability that an individual is linked to chlamydia in each Alice of the published datasets is at most 1/2. Why? Emily Published Data Sex Zipcode Disease M 6500* flu M 6500* chlamydia M 6500* flu Possible World Analysis 31
Time = 1 Public 2 -invariance Voter Registration List Time = 2 Name Sex Zipcode Raymond M 65001 M 65002 F 65014 F 65015 F 65010 2 -invariance provides the local guarantee. Peter Mary Probability that an individual is linked to chlamydia in each Alice of the published datasets is at most 1/2. Why? Emily Published Data Sex Zipcode Disease M 6500* flu M 6500* chlamydia M 6500* flu Possible World Analysis Sex Zipcode Disease M 65001 flu M 65002 chlamydia Sex Zipcode Disease M 65001 chlamydia M 65002 flu This is the possible world analysis based on the published table at time = 1 only. 32
Time = 1 Public 2 -invariance Voter Registration List Time = 2 Name Sex Zipcode Raymond M 65001 M 65002 F 65014 F 65015 F 65010 2 -invariance provides the local guarantee. Peter Mary Probability that an individual is linked to chlamydia in each Alice of the published datasets is at most 1/2. Why? Emily Published Data Sex Zipcode Disease M 6500* flu M 6500* chlamydia M 6500* flu Possible World Analysis Sex Zipcode Disease M 65001 flu M 65002 chlamydia Sex Zipcode Disease M 65001 chlamydia M 65002 flu This is the possible world analysis based on the published table at time = 2 only. 33
Time = 1 Public 2 -invariance Voter Registration List Time = 2 Name Sex Zipcode Raymond M 65001 M 65002 F 65014 F 65015 F 65010 2 -invariance provides the local guarantee. Peter Mary Probability that an individual is linked to chlamydia in each Alice of the published datasets is at most 1/2. Why? Emily Published Data Sex Zipcode Disease M 6500* flu M 6500* chlamydia M 6500* flu Possible World Analysis World 1 World 2 World 3 World 4 Sex Zipcode Disease M 65001 flu M 65002 chlamydia Sex Zipcode Disease M 65001 flu M 65001 chlamydia M 65002 flu Sex Zipcode Disease M 65001 chlamydia M 65001 flu M 65002 chlamydia Sex Zipcode Disease M 65001 chlamydia M 65002 flu 34
Time = 1 Public 2 -invariance Voter Registration List Time = 2 Name Sex Zipcode Raymond M 65001 M 65002 F 65014 F 65015 F 65010 2 -invariance provides the local guarantee. Peter Mary Probability that an individual is linked to chlamydia in each Alice of the published datasets is at most 1/2. Why? Emily Published Data In the published data at time = 1, Sex Zipcode Disease Prob(the second individual (i. e. Peter) is M 6500* flu linked to chlamydia) = 2/4 = 1/2 M 6500* Sex Zipcode Disease M 6500* chlamydia M 6500* flu Sex Zipcode Disease M 65001 flu Possible World Analysis World 1 World 2 World 3 World 4 Sex Zipcode Disease M 65001 flu M 65002 chlamydia Sex Zipcode Disease M 65001 flu M 65001 chlamydia M 65002 flu Sex Zipcode Disease M 65001 chlamydia M 65001 flu M 65002 chlamydia Sex Zipcode Disease M 65001 chlamydia M 65002 flu Yes No No 35
Time = 1 Public 2 -invariance Voter Registration List Time = 2 Name Sex Zipcode Raymond M 65001 M 65002 F 65014 F 65015 F 65010 2 -invariance provides the local guarantee. Peter Mary Probability that an individual is linked to chlamydia in each Alice of the published datasets is at most 1/2. Why? Emily Published Data Sex Zipcode Disease M 6500* flu M 6500* chlamydia In the. Sex published at time = 2, Zipcodedata Disease Prob(the second individual (i. e. Peter) is M 6500* chlamydia linked to chlamydia) = 2/4 = 1/2 M 6500* flu Possible World Analysis World 1 World 2 World 3 World 4 Sex Zipcode Disease M 65001 flu M 65002 chlamydia Sex Zipcode Disease M 65001 flu M 65001 chlamydia M 65002 flu Sex Zipcode Disease M 65001 chlamydia M 65001 flu M 65002 chlamydia Sex Zipcode Disease M 65001 chlamydia M 65002 flu Yes No 36
Time = 1 Public 2 -invariance Voter Registration List Time = 2 Name Sex Zipcode Raymond M 65001 M 65002 F 65014 F 65015 F 65010 2 -invariance provides the local guarantee. Peter Mary Probability that an individual is linked to chlamydia in each Alice of the published datasets is at most 1/2. Emily Global Guarantee: Probability that an individual is linked to chlamydia in one or more published dataset is at most 1/2. Prob(the second individual (i. e. Peter) is linked to chlamydia in one or more Possible World Analysis published dataset) = World 1 World 2 World 3 World 4 Sex Zipcode Disease M 65001 flu M 65002 chlamydia Sex Zipcode Disease M 65001 flu M 65001 chlamydia M 65002 flu Sex Zipcode Disease M 65001 chlamydia M 65001 flu M 65002 chlamydia Sex Zipcode Disease M 65001 chlamydia M 65002 flu 37
Time = 1 Public 2 -invariance Voter Registration List Time = 2 Name Sex Zipcode Raymond M 65001 M 65002 F 65014 F 65015 F 65010 2 -invariance provides the local guarantee. Peter Mary Probability that an individual is linked to chlamydia in each Alice of the published datasets is at most 1/2. Emily Global Guarantee: Probability that an individual is linked to chlamydia in one or more published dataset is at most 1/2. Prob(the second individual (i. e. Peter) is linked to chlamydia in one or more Possible World Analysis published dataset) = World 1 World 2 World 3 World 4 Sex Zipcode Disease M 65001 flu M 65002 chlamydia Sex Zipcode Disease M 65001 flu M 65001 chlamydia M 65002 flu Sex Zipcode Disease M 65001 chlamydia M 65001 flu M 65002 chlamydia Sex Zipcode Disease M 65001 chlamydia M 65002 flu Yes 38
Time = 1 Public 2 -invariance Voter Registration List Time = 2 Name Sex Zipcode Raymond M 65001 M 65002 F 65014 F 65015 F 65010 2 -invariance provides the local guarantee. Peter Mary Probability that an individual is linked to chlamydia in each Alice of the published datasets is at most 1/2. Emily Global Guarantee: Probability that an individual is linked to chlamydia in one or more published dataset is at most 1/2. Prob(the second individual (i. e. Peter) is linked to chlamydia in one or more Possible World Analysis published dataset) = World 1 World 2 World 3 World 4 Sex Zipcode Disease M 65001 flu M 65002 chlamydia Sex Zipcode Disease M 65001 flu M 65001 chlamydia M 65002 flu Sex Zipcode Disease M 65001 chlamydia M 65001 flu M 65002 chlamydia Sex Zipcode Disease M 65001 chlamydia M 65002 flu Yes 39
Time = 1 Public 2 -invariance Voter Registration List Time = 2 Name Sex Zipcode Raymond M 65001 M 65002 F 65014 F 65015 F 65010 2 -invariance provides the local guarantee. Peter Mary Probability that an individual is linked to chlamydia in each Alice of the published datasets is at most 1/2. Emily Global Guarantee: Probability that an individual is linked to chlamydia in one or more published dataset is at most 1/2. Prob(the second individual (i. e. Peter) is linked to chlamydia in one or more Possible World Analysis published dataset) = World 1 World 2 World 3 World 4 Sex Zipcode Disease M 65001 flu M 65002 chlamydia Sex Zipcode Disease M 65001 flu M 65001 chlamydia M 65002 flu Sex Zipcode Disease M 65001 chlamydia M 65001 flu M 65002 chlamydia Sex Zipcode Disease M 65001 chlamydia M 65002 flu Yes Yes 40
Time = 1 Public 2 -invariance Voter Registration List Time = 2 Name Sex Zipcode Raymond M 65001 M 65002 F 65014 F 65015 F 65010 2 -invariance provides the local guarantee. Peter Mary Probability that an individual is linked to chlamydia in each Alice of the published datasets is at most 1/2. Emily Global Guarantee: Probability that an individual is linked to chlamydia in one or more published dataset is at most 1/2. Prob(the second individual (i. e. Peter) is linked to chlamydia in one or more Possible World Analysis published dataset) = 3/4 World 1 World 2 World 3 World 4 Sex Zipcode Disease M 65001 flu M 65002 chlamydia Sex Zipcode Disease M 65001 flu M 65001 chlamydia M 65002 flu Sex Zipcode Disease M 65001 chlamydia M 65001 flu M 65002 chlamydia Sex Zipcode Disease M 65001 chlamydia M 65002 flu This value is larger 1/2. M than 65001 flu Yes Yes No 41
n n We illustrate how we derive a probabilty that an individual is linked to chlamydia with an example (for both local guarantee and global guarantee). In fact, the general formula is much more complicated. 42
n Theorem: Global guarantee is a stronger privacy requirement than local guarantee. If the published tables satisfy global guarantee, then they satisfy local guarantee. 43
n n How can we generate tables such that they satisfy global guarantee? Idea: Large group size 44
Global Guarantee Time = 1 Public Time = 2 Prob(the second individual (i. e. Peter) is linked to chlamydia in one or more published datasets) = 7/16 Published This value is smaller. Data than 1/2. Published Data Sex Zipcode Disease M/F 650** flu M/F 650** chlamydia M/F 650** flu M/F 650** fever M/F 650** flu Release the data set to public Hospital Medical Data Release the data set to public Medical Data Name Sex Zipcode Disease Raymond M 65001 flu Peter M 65002 chlamydia Mary F 65014 flu Mary F 65014 fever Alice F 65015 fever Emily F 65010 flu 45
5. Conclusion n n We are the first to propose global guarantee Global guarantee is a stronger privacy requirement than local guarantee. 46
Q&A 47
2 -invariance (Local Time = 2 Guarantee) Prob(the second individual (i. e. Peter) is linked to chlamydia in one or more published dataset) = 3/4 Time = 1 Public Published Data 1/2. This value is larger than Published Data Sex Zipcode Disease M 6500* flu M 6500* chlamydia F 6501* flu F 6501* fever F 6501* flu Release the data set to public Hospital Medical Data Release the data set to public Medical Data Name Sex Zipcode Disease Raymond M 65001 flu Peter M 65002 chlamydia Mary F 65014 flu Mary F 65014 fever Alice F 65015 fever Emily F 65010 flu 48
- Slides: 48