QUANTIFYING INFORMATION LOSS AFTER REDACTING DATA PROVENANCE TEAM

  • Slides: 20
Download presentation
QUANTIFYING INFORMATION LOSS AFTER REDACTING DATA PROVENANCE TEAM: AVINI SOGANI VAISHNAVI SUNKU VENUGOPAL BOPPA

QUANTIFYING INFORMATION LOSS AFTER REDACTING DATA PROVENANCE TEAM: AVINI SOGANI VAISHNAVI SUNKU VENUGOPAL BOPPA

INTERNET OF THINGS

INTERNET OF THINGS

SEMANTIC WEB AND PROVENANCE • Meaning behind anything you say • Semantic web is

SEMANTIC WEB AND PROVENANCE • Meaning behind anything you say • Semantic web is the platform that provides secure sharing of heterogeneous data on the web. • Provenance of data can be traced down to the origin of the data or can be simply an immediate source. • Provides assessment of authenticity, enables trust, and provides assurance for data quality and thereby allows reproducibility of that resource.

 • Imposing restrictions to data access by users • Types – DAC, MAC,

• Imposing restrictions to data access by users • Types – DAC, MAC, RBAC REDACTION • Process of removing or hiding sensitive data • Protect sensitive information from unauthorized users

RELATED WORK

RELATED WORK

PRIVACY CONTROL ACTS • HIPAA – Health Insurance Portability and Accountability Act • Regulates

PRIVACY CONTROL ACTS • HIPAA – Health Insurance Portability and Accountability Act • Regulates EMR/EPR • PHI – Protected Health Information • PII – Personally Identifiable Information • HITECH Act – Health Information Technology for Economic and Clinical Health • Minimum necessary for the stated purpose

W 3 C RECOMMENDATIONS • A. C. model applications • • • File systems

W 3 C RECOMMENDATIONS • A. C. model applications • • • File systems Database Provenance? • Data Models: • • RDF (Triples, subject, predicate, object) OPM • Querying: • • OPQL (From(e), to, from-1(n), to-1, prev(n), next) SPARQL (Regular expressions)

REDACTION POLICIES • Medical Scenario

REDACTION POLICIES • Medical Scenario

REDACTION ON DATA PROVENANCE • Why med: Doc 1_2?

REDACTION ON DATA PROVENANCE • Why med: Doc 1_2?

REDACTION BY GRAPH GRAMMAR AND R. E.

REDACTION BY GRAPH GRAMMAR AND R. E.

ARCHITECTURE

ARCHITECTURE

LIMITATIONS

LIMITATIONS

 • No Quantification of the information lost by the process of redaction •

• No Quantification of the information lost by the process of redaction • The availability of redacted information available from different source (internet, knowledge of the context. . )

OUR PROPOSALS

OUR PROPOSALS

INFORMATION LOSS • Relevance of the data to the user • Vectorial model formula

INFORMATION LOSS • Relevance of the data to the user • Vectorial model formula for calculating the relevance • Terms: • • • True relevant data Retrieved data Relevant data F Measure (precision and recall) NMI (Normalized Mutual Information)

INFORMATION LOSS

INFORMATION LOSS

CONCLUSION

CONCLUSION

REFERENCES: • Query Language Constructs for Provenance, Murali Mani, Mohamad Alawa, Arunlal Kalyanasundaram •

REFERENCES: • Query Language Constructs for Provenance, Murali Mani, Mohamad Alawa, Arunlal Kalyanasundaram • Tyrone Cadenhead, Vaibhav Khadilkar, Murat Kantarcioglu, and Bhavani Thuraisingham. 2011. Transforming provenance using redaction. In Proceedings of the 16 th ACM symposium on Access control models and technologies (SACMAT '11). ACM, New York, NY, USA, 93102. • Tyrone Cadenhead, Vaibhav Khadilkar, Murat Kantarcioglu and Bhavani Thuraisingham, A Language for Provenance Access Control • Nettleton, David F. , and Daniel Abril. "An Information Retrieval Approach to Document Sanitization. " Advanced Research in Data Privacy. Springer International Publishing, 2015. 151 -166. • Baeza-Yates, R. , Ribeiro-Neto, B. : Modern Information Retrieval: The Concepts and Technology Behind Search, 2 nd edn. ACM Press Books, England (2011)

THANK YOU. .

THANK YOU. .