A model for data revelation Poorvi Vora Dept

  • Slides: 23
Download presentation
A model for data revelation Poorvi Vora Dept. of Computer Science George Washington University

A model for data revelation Poorvi Vora Dept. of Computer Science George Washington University Poorvi Vora/CS/GWU

“Security” frameworks Binary • Divide the world into trusted and untrusted parties • Provides

“Security” frameworks Binary • Divide the world into trusted and untrusted parties • Provides complete revelation of information or complete protection E. g. multiparty computation, encrypted data 9/23/2021 Poorvi Vora/CS/GWU 2

Even a statistic or aggregate reveals “private” information Secure multiparty computation reveals f(x 1,

Even a statistic or aggregate reveals “private” information Secure multiparty computation reveals f(x 1, x 2, . . xn) And nothing more. Yet, this reveals information about all xi Thus, typical security assurances not enough 9/23/2021 Poorvi Vora/CS/GWU 3

What is privacy • Control over information • Extent of information revelation Tensions between:

What is privacy • Control over information • Extent of information revelation Tensions between: Access to aggregate information for community Vs. Individual control reputation vs. predjudice 9/23/2021 Poorvi Vora/CS/GWU 4

Individual control requires more than binary security of personal information Information is often given

Individual control requires more than binary security of personal information Information is often given up for something in return – Safeway card – Monthly charge to be kept of phone books – Information for community statistics: • Health statistics • Collaborative filtering/personalization in virtual communities 9/23/2021 Poorvi Vora/CS/GWU 5

A model: introduce uncertainty maximum uncertainty (i. e. secrecy) corresponds to crypto protocols •

A model: introduce uncertainty maximum uncertainty (i. e. secrecy) corresponds to crypto protocols • Alice and Bob determine: – a binary data point from Alice’s personal information, x – a probability of truth, p – a return, y • Alice reveals a variable z = x with probability p • Bob provides, in return, y • z exists in the ether as Alice’s value x with probability p This is not mutually exclusive with cryptographic protection (p=0. 5 is cryptographic) Used in public health community for twenty odd years 9/23/2021 Poorvi Vora/CS/GWU 6

Outcome Protocol is a mathematical game between Alice and Bob Optimal situation not when

Outcome Protocol is a mathematical game between Alice and Bob Optimal situation not when no information is revealed, but when Alice gets maximum benefit for her information Think about this: should women in Africa test for HIV when they will certainly not obtain any treatment for it? 9/23/2021 Poorvi Vora/CS/GWU 7

An analogy • The protocol is a communication channel • The sender is Alice,

An analogy • The protocol is a communication channel • The sender is Alice, the receiver (malicious? ) Bob • The probability of error is the probability of a lie 9/23/2021 Poorvi Vora/CS/GWU 8

Security properties of randomization • Repeated queries Error 0 as n And n as

Security properties of randomization • Repeated queries Error 0 as n And n as Error 0 • Cost to attacker increases without bound if error not bounded above zero • This is a repetition code over channel 9/23/2021 Poorvi Vora/CS/GWU 9

Other attacks Query 1: Graying? Query 2: Balding? Query 3: Weight? Query 4: Sports?

Other attacks Query 1: Graying? Query 2: Balding? Query 3: Weight? Query 4: Sports? Really asking about age and gender How does one characterize all such attacks? What can one say about security wrt such attacks? 9/23/2021 Poorvi Vora/CS/GWU 10

An analogy • The protocol is a communication channel • The sender is Alice,

An analogy • The protocol is a communication channel • The sender is Alice, the receiver (malicious? ) Bob • The probability of error is the probability of a lie • The attributes that Bob wants to determine form the message 9/23/2021 Poorvi Vora/CS/GWU 11

A simple attack • Query 1: Female? • Query 2: Over 40? • Query

A simple attack • Query 1: Female? • Query 2: Over 40? • Query 3: Losing Calcium? Query 3 checks answers to Query 1 and 2 Is a parity-check it 9/23/2021 Poorvi Vora/CS/GWU 12

An analogy • All attacks are communication over channel • Good attacks are codes

An analogy • All attacks are communication over channel • Good attacks are codes • What Bob queries is a codeword bit • What he receives is the transmitted codeword that he decodes 9/23/2021 Poorvi Vora/CS/GWU 13

Shannon’s theorems apply In fact, assuming any functions of Alice’s data points as queries

Shannon’s theorems apply In fact, assuming any functions of Alice’s data points as queries (adaptive, related queries) and error probability 0 as n The number of queries required per bit of entropy is asymptotically tightly bound below by the inverse of the channel capacity Above this bound, error tends exponentially to 0 Below it, it increases exponentially with n 9/23/2021 Poorvi Vora/CS/GWU 14

Questions • How does one determine the entropy of a particular data set, or

Questions • How does one determine the entropy of a particular data set, or a general data set? • What kinds of attacks are computationally feasible? • This was a very powerful attacker. What are reasonable limits on the attacker’s abilities? • Result in itself, independent of model. • Partly published at Int. Symp. Info. Theory, 2003 • Journal paper in review, at website 9/23/2021 Poorvi Vora/CS/GWU 15

Value-free model • Human rights aspects covered through crypto protocols • Necessary health information

Value-free model • Human rights aspects covered through crypto protocols • Necessary health information and community information can be gathered • Consumer behaviour treated through this game • Criticism: very adversarial model 9/23/2021 Poorvi Vora/CS/GWU 16

Another application: anonymous delivery Crowds: Reiter and Rubin/Lucent and AT&T At node i+1: node

Another application: anonymous delivery Crowds: Reiter and Rubin/Lucent and AT&T At node i+1: node i more likely than any other B A E Message: sending node C Received symbol: Node i D N nodes; pf probability of forwarding 9/23/2021 Receiver: Node i+1 Channel characteristic: Probability that true sender is Node i, Probability that other nodes are senders Traffic analysis/data mining: correlations among senders (communication across channel, less efficient than some error-correcting code) Poorvi Vora/CS/GWU 17

An example of model use to measure the value of information with Yu-An Sun

An example of model use to measure the value of information with Yu-An Sun and Sumit Joshi • Auction bids reveal much about an individual’s profile • Consider the Vickrey – sealed second highest bid – auction – Optimal strategy: to bid one’s valuation – Bids (and hence valuations) can be protected with secure multiparty computation – But, bids allow determination of market demand (efficient markets) – Need for an aggregate value, not well-defined at the moment of the auction 9/23/2021 Poorvi Vora/CS/GWU 18

Variably Private Vickrey – Bidding Round Introduce uncertainty • The seller announces a minimum

Variably Private Vickrey – Bidding Round Introduce uncertainty • The seller announces a minimum sale price and a maximum randomization setting. • Each bidder submits a sealed interval containing her bid. The size of the interval is her choice. • In the running with high end, committed to low 9/23/2021 Poorvi Vora/CS/GWU 19

Variably Private Vickrey – Revealing Round • Bidders not in the running will reveal

Variably Private Vickrey – Revealing Round • Bidders not in the running will reveal no more information on their valuations. • Largest of the others will reveal which half of their interval contains valuation 9/23/2021 Poorvi Vora/CS/GWU 20

Sale Price Seller gets { Buyer pays Divided among all bidders proportional to the

Sale Price Seller gets { Buyer pays Divided among all bidders proportional to the interval width 9/23/2021 Poorvi Vora/CS/GWU 21

Properties? • Provides various demand statistics • In general, accuracy of future bid estimation

Properties? • Provides various demand statistics • In general, accuracy of future bid estimation lower for more uncertainty • Allows for bidder to vary uncertainty, and pay for it • Allows seller to obtain more than regular Vickrey, depending on how much information is valued • Bidder with highest valuation still wins auction as long as she can tolerate revealing her valuation to the extent required. 9/23/2021 Poorvi Vora/CS/GWU 22

Summary A model that we hope will: – Provide choices not currently typically available

Summary A model that we hope will: – Provide choices not currently typically available to users – Extend the security framework to include problems like those in statistical databases – Provide a means of measuring uncertainty in situations where there is some not none or complete – Include other leakage from security-related protocols such as anonymous delivery and ciphers – Be useful for measuring the economic value of information 9/23/2021 Poorvi Vora/CS/GWU 23