A Secure Multiparty Computation Scheme for Privacypreserving Association

  • Slides: 26
Download presentation
A Secure Multi-party Computation Scheme for Privacy-preserving Association Rules Mining Presenter : Chunhua Su

A Secure Multi-party Computation Scheme for Privacy-preserving Association Rules Mining Presenter : Chunhua Su Master 1, Sakurai Lab †九州大学システム情報科学府 12/2/2020 1

Outline Introduction and Background l Related works l Privacy-preserving Association Rules Mining Protocol l

Outline Introduction and Background l Related works l Privacy-preserving Association Rules Mining Protocol l Security and Complexity Analysis l Conclusions and future works l 12/2/2020 2

Data Mining and Privacy Primary task in data mining: Develop models about aggregated data.

Data Mining and Privacy Primary task in data mining: Develop models about aggregated data. To find the valuable information in large amount of data which can not be seen obviously. l Privacy-preserving data mining has concentrated on obtaining valid results when the input data is private. l So we want to develop accurate models without access to precise information in 12/2/2020 3 individual data records l

Privacy Constraints l Individual Privacy l Nobody should know more about any entity after

Privacy Constraints l Individual Privacy l Nobody should know more about any entity after the data mining than they did before l Approaches: Data Obfuscation, Value swapping l Organization Privacy l Protect entities knowledge about a collection of l Individual entity values may be known to all parties l Which entities are at which site may be secret 12/2/2020 4

Can we develop accurate models without access to the original data? 12/2/2020 5

Can we develop accurate models without access to the original data? 12/2/2020 5

Solution………. l Insurance companies l Do not give access to the original data l

Solution………. l Insurance companies l Do not give access to the original data l Provide some sort of statistics on the data so that the original data cannot be retrieved from such statistics l Such data can be used to identify trends and patterns 12/2/2020 6

Association Rule Mining l Given a set of transactions, find rules that will predict

Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of an item based on the occurrences of other items in the transaction l Also known as market basket analysis Market-Basket transactions Example of Association Rules {Diaper} {Beer}, {Milk, Bread} {Eggs, Coke}, {Beer, Bread} {Milk}, Implication means co-occurrence, not causality! 12/2/2020 7

Related Works l Random Data Perturbation Methodology ü ü R. Agrawal et al “Privacy

Related Works l Random Data Perturbation Methodology ü ü R. Agrawal et al “Privacy Preserving Data Mining” Alexandre Evfimievski et al ”Privacy Preserving Mining of Association Rules” l Cryptography-based Methodology ü M. Kantarcioglu , C. Clifton. ”Privacy preserving distributed mining of association rules on horizontally partitioned data” J. S. Vaidya, C. Clifton. ”Privacy preserving association rule mining in vertically partitioned data” ü 12/2/2020 8

Randomization Approach Overview R. Agrawal et al “Privacy Preserving Data Mining” 30 | 70

Randomization Approach Overview R. Agrawal et al “Privacy Preserving Data Mining” 30 | 70 K |. . . 50 | 40 K |. . . Randomizer 65 | 20 K |. . . 25 | 60 K |. . . Reconstruct distribution of Age Reconstruct distribution of Salary Data Mining Algorithms 12/2/2020 . . Model 9

Reconstruction Problem Original values x 1, x 2, . . . , xn l

Reconstruction Problem Original values x 1, x 2, . . . , xn l from probability distribution X (unknown) l To hide these values, we use y 1, y 2, . . . , yn l from probability distribution Y l Given l x 1+y 1, x 2+y 2, . . . , xn+yn l the probability distribution of Y Estimate the probability distribution of X. l 12/2/2020 10

Overview of the Cryptographic  Method (Kantarcioglu and Clifton’s Proposal) Find the union of the

Overview of the Cryptographic  Method (Kantarcioglu and Clifton’s Proposal) Find the union of the locally large In this method, every clients will suffer candidate itemsets securely from great computation complexity and lcommunication After the local pruning, compute the complexity. globally supported large itemsets securely Our protocol is going to solve this l At the end check the confidence of the problem potential rules securely l 12/2/2020 11

Cryptographic methods vs. perturbation methods overhead Cryptographic methods This work… inaccuracy lack of privacy

Cryptographic methods vs. perturbation methods overhead Cryptographic methods This work… inaccuracy lack of privacy 12/2/2020 perturbation methods 12

Privacy-preserving Association Rules Mining Protocol 12/2/2020 13

Privacy-preserving Association Rules Mining Protocol 12/2/2020 13

Secure Multiparty Computation Yao’s Millionaires’ Problem Ms. Million is even richer! Who’s Black box

Secure Multiparty Computation Yao’s Millionaires’ Problem Ms. Million is even richer! Who’s Black box richer? Ms. Million Worth $a Mr. Trump Worth $b 12/2/2020 © Anli Chen, Nov. 17, 2004 Electrical Engineering, Texas A&M University 15 8/20

Secure Multiparty Computation: Definitions l Secure l Nobody knows anything but their own input

Secure Multiparty Computation: Definitions l Secure l Nobody knows anything but their own input and the results l Formally: polynomial time S such that {S(x, f(x, y))} ≡ {View(x, y)} Semi-Honest model: follow protocol, but remember intermediate exchanges l Malicious: “cheat” to find something out l 12/2/2020 16

Privacy-preserving Association Rules Mining Protocol l 1. 2. 3. Our Purposes: To get the

Privacy-preserving Association Rules Mining Protocol l 1. 2. 3. Our Purposes: To get the accurate global association rules mining result in the distributed database. To preserve the clients privacy. To maintain the anonymity of the clients 12/2/2020 17

The Components of our Protocol l Mining Clients: Want to get the global result

The Components of our Protocol l Mining Clients: Want to get the global result with other distributed clients while preserve the privacy. Computation Center: This site can perform as a key center and generate public and private key for each client, then do the association rules mining computation. Non-Colluding Mix Center: This site will arrange the transaction items’ ID order in the matrix form and mix the encrypted transactions. 12/2/2020 18

Toolkits Commutative Encryption: A public key encryption scheme. l Definition: an encryption scheme is

Toolkits Commutative Encryption: A public key encryption scheme. l Definition: an encryption scheme is commutative if Ek 1[Ek 2[M]]=Ek 2[Ek 1[M]] and then Dk 1[Dk 2[Ek 1[Ek 2[M]]]]=M l We use Commutative Encryption to encrypted the clients’ transactions. l 12/2/2020 19

Protocol Description In Our Protocol, the transaction database can be seen also as a

Protocol Description In Our Protocol, the transaction database can be seen also as a binary matrix where each row corresponds to a transaction, each column corresponds to an item. l The key technical problem in privacy preserving association-rule mining is to find frequent itemsets securely. l 12/2/2020 20

Protocol (1) Agreement on Matrix ID Order Non-Colluding Mix Center Matrix ID Order Request

Protocol (1) Agreement on Matrix ID Order Non-Colluding Mix Center Matrix ID Order Request Client_1 12/2/2020 …… Client_i …… Client_n 21

Protocol (2) Encryption Keys Distribution {pk 1…pkn} Non-Colluding Mix Center Computation Center pk 1,

Protocol (2) Encryption Keys Distribution {pk 1…pkn} Non-Colluding Mix Center Computation Center pk 1, , co-pk Client_1 12/2/2020 Pki co-pk …… Client_i Pkn co-pk …… Client_n 22

Protocol (3) Non-Colluding Mix Center Client_1 12/2/2020 …… Client_i …… Client_n 23

Protocol (3) Non-Colluding Mix Center Client_1 12/2/2020 …… Client_i …… Client_n 23

Protocol (4) Decrypted all the re-encrypted Do the Association transactions Rules Mining All The

Protocol (4) Decrypted all the re-encrypted Do the Association transactions Rules Mining All The re-encrypted Transactions Computation Center Non-Colluding Mix Center The Global Mining Result Client_1 12/2/2020 …… Client_i …… Client_n 24

Security Analysis l Computing Center View: Does not learn anything useful to recover clients’

Security Analysis l Computing Center View: Does not learn anything useful to recover clients’ privacy, because all it sees are values without ID of the transactions to identify the items. l Non-Colluding Mix Center View: Learns the matrix ID order, but does not know the share keys co-pk, it can not recover the itemsets under the assumption public-key encryption scheme should be semantically secure. 12/2/2020 25

Complexity Analysis l Communication complexity: Each site express their transactions data into a matrix,

Complexity Analysis l Communication complexity: Each site express their transactions data into a matrix, so we can see tha for each itemset of size j, O(n j) bits must be sent during the execution. l Computing Complexity: Computing Center will bear the most computation complexity, it has to decrypt nt bit for nt times. 12/2/2020 26

Conclusion & Future Work We proposed a general framework of secure multi-party computation for

Conclusion & Future Work We proposed a general framework of secure multi-party computation for privacy preserving association rules mining. l Privacy and Efficiency are both important for Secure Data Mining. l Our protocol works with less computation complexity and communication complexity compared to other related schemes. l To reduce the communication complexity is also a problem to be solve l 12/2/2020 27