Privacy Policy Law and Technology Data Privacy October
Privacy Policy, Law and Technology Data Privacy October 30, 2008 CMU Usable Privacy and Security Laboratory http: //cups. cmu. edu/ 1
k-anonymity § “A release provides k-anonymity protection if the information for each person contained in the release cannot be distinguished from at least k-1 individuals whose information also appears in the release. ” § k = number of individuals to which a pattern of data (quasi-identifiers) may be attributed http: //privacy. cs. cmu. edu/people/sweeney/kanonymity. html CMU Usable Privacy and Security Laboratory http: //cups. cmu. edu/ 2
l-diversity § Large values of k may be insufficient to protect privacy when records with the same quasi-identifiers do not have a diverse set of values for their sensitive elements – Example: • A table of medical records may use truncated zip-code and age range as quasiidentifiers, and may be k-anonymized such that there at least k records for every combination of quasi-identifiers • For some sets of quasi-identifiers, all patients have the same diagnosis or a small number of diagnoses § The l-diversity principle adds the requirement that there be at least l values for sensitive elements that share the same quasi-identifiers – Example: • Every for every zip/age combo, there must be at least 5 different diagnoses Machanavajjhala, A. , Kifer, D. , Gehrke, J. , and Venkitasubramaniam, M. 2007. Ldiversity: Privacy beyond k-anonymity. ACM Trans. Knowl. Discov. Data 1, 1 (Mar. 2007), 3. DOI= http: //doi. acm. org/10. 1145/1217299. 1217302 CMU Usable Privacy and Security Laboratory http: //cups. cmu. edu/ 3
De-identification and reidentification § Simplistic de-identification: remove obvious identifiers § Better de-identification: also k-anonymize and/or use statistical confidentiality techniques § Re-identification can occur through linking entries within the same database or to entries in external databases CMU Usable Privacy and Security Laboratory http: //cups. cmu. edu/ 4
Examples § When RFID tags are sewn into every garment, how might we use this to identify and track people? § What if the tags are partially killed so only the product information is broadcast, not a unique ID? § How can a cellular provider identify an anonymous pre-paid cell phone user? § Other examples? CMU Usable Privacy and Security Laboratory http: //cups. cmu. edu/ 5
Techniques for protecting privacy § Best – No collection of contact information – No collection of long term person characteristics – k-anonymity with large value of k or l-diversity with large value of l § Good – – No unique identifiers across databases No common attributes across databases Random identifiers Contact information stored separately from profile or transaction information – Collection of long term personal characteristics on a low level of granularity – Technically enforced deletion of profile details at regular intervals CMU Usable Privacy and Security Laboratory http: //cups. cmu. edu/ 6
Homework 5 discussion § http: //cups. cmu. edu/courses/privpolawt ech-fa 08/hw/hw 5. html CMU Usable Privacy and Security Laboratory http: //cups. cmu. edu/ 7
- Slides: 7