Geno Guard Protecting Genomic Data against BruteForce Attacks
Geno. Guard: Protecting Genomic Data against Brute-Force Attacks 36 th IEEE Symposium on Security and Privacy May 18, 2015 Zhicong Huang 1, Erman Ayday 2, Jacques Fellay 3, Jean-Pierre Hubaux 1, Ari Juels 4 1 School of Computer and Communication Sciences, EPFL University 3 School of Life Sciences, EPFL 4 Cornell Tech (Jacobs) 2 Bilkent
The Genomic Avalanche Is Coming… 2
Things are Moving CS giants start proposing genome-related services o Google Genomics (API to store, process, explore, and share DNA data) o IBM Research (computational genomics) o Microsoft Research (genomic research in collaboration with Sanger Center) o Apple (the Research. Kit program) Global Alliance for Genomics & Health o Definition of a common framework for effective, responsible and secure sharing of genomic and clinical data o Security Working Group: security infrastructure policy and technology http: //genomicsandhealth. org/our-work/working-groups/securityworking-group/work-products 3
Background: Genomics 4
Genomics Background Single Nucleotide Variant (SNV) 4 million SNVs per individual A subset of 50 million SNVs that have been discovered Major allele (0), minor allele (1) Correlations between SNVs Genotype data (to be protected) Consider a pair of chromosomes (out of 23 pairs in human genome) For an SNV position, encoded as the number of minor alleles (0, 1, or 2) 5
Genomic Privacy High sensitivity Predisposition to disease Genetic discrimination: Denial of access to health insurance, education, and employment. Long-term data value GATTACA, 1997 Movie But attackers’ computing power keeps increasing Can the protection survive longer? 6
Background: Honey Encryption [1] A. Juels, T. Ristenpart. Honey Encryption: Security Beyond the Brute-Force Bound. EUROCRYPT, 2014. 7
Honey Encryption Messages: Gene_Q ei. Kang. Lpkandlf Passwords: Correct password Wrong password Conventional Encryption dd. Uo. IOkes. Lh. Knb Honey Encryption Passwords: Correct password Wrong password Messages: Gene_Q Gene_R The threat of brute-force attacks is mitigated by using honey encryption. 8
Distribution-Transforming Encoder (DTE) Encrypt the seed Uniform Encoder 00 “Gene_Q” p = 1/4 01 “Gene_R” p = 1/4 10 “Gene_S” p = 1/2 11 Seed space Non-uniform Probability Decoder Message space 1/2 1/4 pm: original message distribution Message pd: DTE message distribution (the probability of getting a message by decoding a randomly picked seed) 9
Distribution-Transforming Encoder (DTE) Encoder Uniform 00 “Gene_Q” p = 1/4 01 “Gene_R” p = 1/4 10 “Gene_S” p = 1/2 11 Seed space Non-uniform Decoder Message space pd = p m 10
Geno. Guard 11
DTE on Genome Sequences n is the number of SNVs Probability of a sequence M=(m 1 , m 2 , … , mn), where mi is from the set {0, 1, 2}: Subsequence: (m 1, m 2, …, mn-1) To encode the sequence : divide seed space in a traversal of the sequence 12
Example • Number of SNVs: n = 3 • Sequence: M = (0, 2, 1) pd = p m • P(m 1 = 0) = 0. 6 • P(m 2 = 2 | m 1 = 0) = 0. 1 • P(m 3 = 1 | M 1, 2 = (0, 2)) = 0. 3 [0, 2 L - 1] L-bit representation 13
Finite-Precision DTE Probability increases Probability decreases Probability pm Message ≈p pd under L-bit representation d Message 14
A Simple Brute-Force Attack One correct password among 1000 passwords Compute the probability of each decrypted sequence Conventional encryption Geno. Guard 15
Defense against Attacks with Phenotypic Traits (1) Ancestry African Principal component 2 European (The data is taken from the Hap. Map project 1) Asian Principal component 1 1 An international project for finding genetic variation with human disease 16
Defense against Attacks with Phenotypic Traits (2) Decrypt as European (red symbols “+”) Ancestry Different DTEs for different Principal component 2 ancestries Wrong password Wrong sequence Yet, consistent ancestry Other traits Privacy loss quantification Principal component 1 17
Performance 18
100’ 000 Time (seconds) Encoding time Number of SNVs Decoding time Chromosome length (# of SNVs) Performance of Encryption (Decryption) • Python • Cluster with 22 nodes • 3. 40 GHz Intel Xeon CPU E 31270 • 64 -bit Linux Debian system • Linear cost depending on the number of SNVs • 0. 5 ms / SNV Password-based encryption (decryption) time 19
Conclusion and Future Work Geno. Guard provides protection of genomic data against brute-force attacks A privacy-preserving solution by taking into account the special characteristics of genomic data Future investigation Extension to other sensitive sequential data More investigation for privacy erosion under data model evolution Source code: https: //github. com/acs 6610987/Geno. Guard To learn more about genomic privacy A website for the research community: https: //genomeprivacy. org 20
- Slides: 20