Data Warehousing Data Mining Privacy Data Warehousing n

  • Slides: 14
Download presentation
Data Warehousing Data Mining Privacy

Data Warehousing Data Mining Privacy

Data Warehousing n Repository of data providing organized and cleaned enterprise-wide data (obtained form

Data Warehousing n Repository of data providing organized and cleaned enterprise-wide data (obtained form a variety of sources) in a standardized format – – – Farkas Data mart (single subject area) Enterprise data warehouse (integrated data marts) Metadata CSCE 824 2

Data Mining n n n Farkas DM: search for correlations, sequences, and trends Prediction

Data Mining n n n Farkas DM: search for correlations, sequences, and trends Prediction Tasks – Use some variables to predict unknown or future values of other variables Description Tasks – Find human-interpretable patterns that describe the data CSCE 824 3

Knowledge Discovery in Databases: Process Interpretation/ Evaluation Data Mining Preprocessing Knowledge Patterns Selection Preprocessed

Knowledge Discovery in Databases: Process Interpretation/ Evaluation Data Mining Preprocessing Knowledge Patterns Selection Preprocessed Data Target Data adapted from: U. Fayyad, et al. (1995), “From Knowledge Discovery to Data Mining: An Overview, ” Advanced in Knowledge Discovery and Data Mining, U. Fayyad et al. (Eds. ), AAAI/MIT Press Farkas CSCE 824 4

Data Mining Technologies n n n Farkas Clustering: find groups of similar data items

Data Mining Technologies n n n Farkas Clustering: find groups of similar data items Classification: separate data items into predefined groups Association rule mining: find dependencies in data Sequential associations: identify event sequences that are likely Detect Deviations: find outliers CSCE 824 5

DM Issues: Integrity n n n Poor quality data: inaccurate, incomplete, missing meta-data Loss

DM Issues: Integrity n n n Poor quality data: inaccurate, incomplete, missing meta-data Loss of traditional consistency, e. g. , keys Source data quality vs. derived data quality – Trust in the result of analysis? Farkas CSCE 824 6

Big Data Security and Privacy n n Large amount of data being considered Probabilistic

Big Data Security and Privacy n n Large amount of data being considered Probabilistic inference – Existing inference prevention: guaranteed truth n Farkas Privacy-preserving analytics CSCE 824 7

Big Data Integrity Data Accuracy n Source provenance n End-point filtering and validation n

Big Data Integrity Data Accuracy n Source provenance n End-point filtering and validation n Data-poisoning Farkas CSCE 824 8

Inference Problem n n n DM: discover “new knowledge” how to evaluate security risks?

Inference Problem n n n DM: discover “new knowledge” how to evaluate security risks? Example security risks: – Prediction of sensitive information – Misuse of information Assurance of “discovery” Farkas CSCE 824 9

Privacy and Sensitivity n n Large volume of private (personal) data Need: – Proper

Privacy and Sensitivity n n Large volume of private (personal) data Need: – Proper acquisition, maintenance, usage, and retention policy – Integrity verification – Control of analysis methods (aggregation may reveal sensitive data) Farkas CSCE 824 10

Privacy n n n Farkas What is the difference between confidentiality and privacy? Identity,

Privacy n n n Farkas What is the difference between confidentiality and privacy? Identity, location, activity, etc. Anonymity vs. accountability CSCE 824 11

Social Network Analysis n n n The mapping and measuring of relationships and flows

Social Network Analysis n n n The mapping and measuring of relationships and flows between people, groups, organizations, computers or other information processing entities Behavioral Profiling Note: Social Network Signatures – User names may change, family and friends are more difficult to change Farkas CSCE 824 12

DM for Security n Large-scale data analytics – Intrusion detection – Insiders misuse detection

DM for Security n Large-scale data analytics – Intrusion detection – Insiders misuse detection n n Fraud detection User/group/web site profiling Farkas CSCE 824 13

Next Class n Farkas Continue on cloud and DM CSCE 824 14

Next Class n Farkas Continue on cloud and DM CSCE 824 14