TOP 10 CHALLENGES IN CRYPTOGRAPHY FOR BIG DATA
TOP 10 CHALLENGES IN CRYPTOGRAPHY FOR BIG DATA Arnab Roy Researcher, Fujitsu Laboratories of America ·Copyright © 2014 Cloud Security Alliance ·www. cloudsecurityalliance. org
How is cryptography for Big Data different? § § BIG § Scale up existing solutions for volume, variety and velocity § Retarget to Big Data infrastructural shift DATA § Balance privacy and utility § Enable analytics and governance on encrypted data § Reconcile authentication and anonymity ·Copyright © 2014 Cloud Security Alliance ·www. cloudsecurityalliance. org
Top 10 Challenges identified by CSA BDWG ·Infrastructur e ·Encryption ·Privacy ·Data Governance ·Communicatio n protocols ·Access policy based encryption ·Secure dissemination ·Data integrity ·Key management ·Searching / filtering encrypted data ·Secure data collection ·/ aggregation ·Proof of data storage ·Secure outsourcing of computation ·Secure collaboration ·Copyright © 2014 Cloud Security Alliance ·www. cloudsecurityalliance. org
Top 10 Challenges identified by CSA BDWG ·Infrastructur e ·Encryption ·Privacy ·Data Governance ·Communicatio n protocols ·Access policy based encryption ·Secure dissemination ·Data integrity ·Key management ·Searching / filtering encrypted data ·Secure data collection ·/ aggregation ·Proof of data storage ·Secure outsourcing of computation ·Secure collaboration ·Copyright © 2014 Cloud Security Alliance ·www. cloudsecurityalliance. org
Key Points § Secure Dissemination - Differential Privacy § Policy-based Encryption § Searching/Filtering Encrypted Data § Secure Outsourcing of Computation § Proof of Data Storage ·Copyright © 2014 Cloud Security Alliance ·www. cloudsecurityalliance. org
Data Privacy § § Anonymization is not privacy! Suppose, there is a public movie database § § Suppose, there is a private movie database § § Anonymous likes movies M, N, U, V, W, X, Y For rare combination of movies U, V, W, it becomes extremely unlikely that someone other than Joe Smith likes the same exact 3 movies § § § User Joe Smith likes movies U, V, W So, it is likely that Joe Smith likes movies M, N, X, Y as well This may reveal private information about Joe, like his political or religious inclinations This is the type of exploit used in de-anonymizing the Netflix dataset by Narayanan and Shmatikov ·Copyright © 2014 Cloud Security Alliance ·www. cloudsecurityalliance. org
Differential Privacy § ·Copyright © 2014 Cloud Security Alliance ·www. cloudsecurityalliance. org
Differential Privacy § Laplace Distribution Source: Wikipedia ·Copyright © 2014 Cloud Security Alliance ·www. cloudsecurityalliance. org
Policy-based Encryption § § Traditionally access control has been enforced by systems – Operating Systems, Virtual Machines § Restrict access to data, based on access policy § Data is still in plaintext § Systems can be hacked! § Security of the same data in transit is a separate concern What if we protect the data itself in a cryptographic shell depending on the access policy? § Decryption only possible by entities allowed by the policy § Keys can be hacked! – but much smaller attack surface § Encrypted data can be moved around, as well as kept at rest – uniform handling ·Copyright © 2014 Cloud Security Alliance ·www. cloudsecurityalliance. org
Identity-based Encryption Public-Key Encryption ID-based Encryption Master Authority Certificate Authority Master Public Key Signed Certificate of Public Key Bob Signed Certificate of Public Key Alice Encrypted Data Bob ·Copyright © 2014 Cloud Security Alliance Encrypted Data George ·www. cloudsecurityalliance. org
Policy-Based Encryption = Mitchell et al. OR Doctor PK AND OR Nurse SK “Doctor” “Neurology” ICU Doctor Nurse AND ICU SK “Nurse” “Physical Therapy” ·Copyright © 2014 Cloud Security Alliance ·www. cloudsecurityalliance. org
Searching and Filtering Encrypted Data § Suppose you have a system to receive emails encrypted under your public key § However, you do not want to receive spam mails § With plain public key encryption, there is no way to distinguish a legitimate email ciphertext from a spam ciphertext! § However, with ongoing research you will be able do the following: § Give a ‘token’ to the spam filter § Spam filter can apply token to the ciphertext, only deducing whether it is spam or not § Filter doesn’t get any clue about any other property of the mail! ·Copyright © 2014 Cloud Security Alliance ·www. cloudsecurityalliance. org
Searching and Filtering Encrypted Data Public Key Filtering Token Secret Key Bob ·Copyright © 2014 Cloud Security Alliance ·www. cloudsecurityalliance. org
Searching with Encrypted Filter § Problem Scenario: § The intelligence gathering community needs to collect a useful subset of huge streaming sources of data § The criteria for being useful may be classified – private criteria § Most of the streaming data is useless and storing it all may be impractical – filter at source § How de we keep the filtering criteria secret even if it is executing at the source? § Solution: Obfuscate the filtration code § Even if the source falls into enemy hands, it cannot figure out the criteria ·Copyright © 2014 Cloud Security Alliance ·www. cloudsecurityalliance. org
Secure data filtration Obfuscate ·Blogs Secret Criteria Garbled Filter ·Net Traffic Cloud ·News Feed Decrypt Encrypted Filtered Data Private Searching on Streaming Data - Ostrovsky and Skeith, CRYPTO 2005 ·Copyright © 2014 Cloud Security Alliance Filtered Data ·www. cloudsecurityalliance. org
Secure Outsourcing of Computation § Suppose you want to send all your sensitive data to the cloud: photos, medical records, financial records, … § You could send everything encrypted § § But wouldn’t be much use if you wanted the cloud to perform some computations on them § What if you wanted to see how much you spent on movies last month? Solution: Fully Homomorphic Encryption § Cloud can perform any computation on the underlying plaintext, all the while the results are encrypted! § Cloud has no clue about the plaintext or the results ·Copyright © 2014 Cloud Security Alliance ·www. cloudsecurityalliance. org
Secure Outsourcing of Computation Fully Homomorphic Encryption (FHE) Homomorphic Transformation Encrypt Plaintext • • Ciphertext Decrypt Processed Ciphertext Plaintext With FHE, computation on plaintext can be transformed into computation on ciphertext As a use case, a cloud can keep and process customer’s data without ever knowing the contents • • Only customer can decrypt the processed data End to end security of customer data ·Copyright © 2014 Cloud Security Alliance ·www. cloudsecurityalliance. org
How does FHE work? § Source: http: //cseweb. ucsd. edu/~daniele/lattice. html ·Copyright © 2014 Cloud Security Alliance ·www. cloudsecurityalliance. org
How does FHE work? · § Source: http: //cseweb. ucsd. edu/~daniele/lattice. html ·Copyright © 2014 Cloud Security Alliance ·www. cloudsecurityalliance. org
Proof of Data Storage § You just uploaded a massive amount of data to the cloud § How do you know at any point whether the data is actually available at the cloud? § § Bad solution: just ask it to resend the data back to you § § Too expensive and defeats the point of moving the data to the cloud Bad solution: ask it to send a hash of the data § § The cloud could have deleted or modified part of your data Can be replayed Crypto solution: Proof of Data Storage ·Copyright © 2014 Cloud Security Alliance ·www. cloudsecurityalliance. org
Some Math § ·Copyright © 2014 Cloud Security Alliance ·www. cloudsecurityalliance. org
Proof of Data Storage · · · · More advanced techniques § § “PORs: Proofs of Retrievability for Large Files” by Juels and Kaliski § “Compact Proofs of Retrievability” by Shacham and Waters ·Copyright © 2014 Cloud Security Alliance ·www. cloudsecurityalliance. org
Why should we do crypto research? § § Cryptography is an enabler § There is a realization across the industry that cryptographic technologies are imperative for cloud and big data § Mathematical assurance of trust gives people more incentive to migrate data and computation to cloud Significant opportunities to explore § Mere anonymization is not privacy § § Systematic and mathematical considerations need to applied when responding to queries on personal data Sophisticated techniques are in research stage or have limited deployments, which enable rich transformations and management of encrypted data ·Copyright © 2014 Cloud Security Alliance ·www. cloudsecurityalliance. org
- Slides: 23