Ensuring Compliance of Patient Data with Big Data
Ensuring Compliance of Patient Data with Big Data and BI Ayad Shammout & Denny Lee April 10 -12, Chicago, IL
Please silence cell phones April 10 -12, Chicago, IL
Agenda A Quick Big Data Primer Healthcare and Big Data Compliance and Auditing SQL Compliance Project Compliance and Auditing with Big Data and BI Big Data: Unstructured Volumes of Data Analytics: Power. Pivot, Power View 3
What is Big Data? Volume Exceeds physical limits of vertical scalability Velocity Decision window small compared to data change rate Variety Many different formats makes integration expensive Variability Many options or variable interpretations confound analysis 4
10 x increase Data explosion every five years 85% from new data types Volume Velocity Variety By 2015, organizations that build a modern information management system will outperform their peers financially by 20 percent. § Hadoop § – Gartner, Mark Beyer “Information Management in the 21 st Century” Cloud 5
New Data Sources Non-traditional Data Types New Technologies Large Data Volumes New Questions & New Insights New Economics
Big Data Business Value 140, 000 -190, 000 15 out of 17 1. 5 million € 250 billion 50 -60% $300 billion 7
Data 8
Hadoop: The most visible face of Big Data Map. Reduce Layer Task tracker Job tracker HDFS Layer Name node Data node 9
HDInsight: Visit Hadoop. On. Azure. com 10 10
Healthcare and Big Data
Healthcare and IT Often the laggard in technology Yet application of IT to healthcare can radically change what we can do Genomic Sequencing Proteomic sequencing Incidence Prediction 12
Healthcare Big Data Example Scenarios Clinical Trial Deviations Originally Viagra was developed to lower blood pressure and treat Angina Now its used to help newborn pulmonary hypertension and altitude sickness Incidence Prediction Missed 4 or more visits, twice as likely to have an asthmatic incident Particular Cardiac monitor sine wave points to highly likelihood of heart attack Campaigns Social media and advertising campaigns to understand user behavior and sentiment Patient Satisfaction Social media and advertising campaigns to understand user behavior and sentiment 13
BIDMC Auditing Scenario Auditing is critical component HIPAA in ensuring patient privacy 1 Billion rows+ of audit data 146 mission critical clinical applications Comprehensive audits yield 300 -500 k transactions/day HIPAA requires audit system with 20 years of data Auditing Project Available to community as part of Compliance SDK Updating for SQL Server 2012, HDInsight, Power View, and Mobile. BI* Creating an enterprise tool for consolidated storage, reporting and alerting of all application audit data - that's cool! John Halamka’s Cool Technology of the Week (Wellsphere Top Health Blogger, Health Impact Award) 14
BIDMC Compliance Project Use Excel 2013 Power. Pivot and Power View HDInsight Azure HDInsight Windows SSAS (tabular) SQL Server 2008/2012 SSIS ETL Logs to HDFS Audit Logs 15
Auditing Sensitive Information 16 16
Storage Infrastructure Audit Logs Transfer files to ASV via Az. Copy, Cloud. Explorer, etc. 17
Storage Infrastructure Azure Storage Vault (ASV) Azure Blob Storage Azure Flat Network Storage Hadoop on Azure Compute Nodes (Medium VMs) 18 18
Storage Infrastructure Push data Stream data Back to Storage To compute Azure Storage Vault (ASV) Azure Blob Storage Azure Flat Network Storage map sort shuffle reduce Hadoop on Azure Compute Nodes (Medium VMs) http: //dennyglee. com/2013/03/18/why-use-blob-storage-with-hdinsight-on-azure/ 19 19
SSIS to HDInsight 20 20
SSIS Processing 21 21
SSAS Tabular of Ho. A Audit Data 22
Hadoop / Auditing: File sizes Currently testing gz vs. raw E. g. 12 MB raw text file vs. 633 Kb gz file (~20 x compression) Query Duration (s) select count(*) from sql_audit_asv_raw 56. 066 select count(*) from sql_audit_asv_gz 58. 994 20 x smaller size, ~same query time Approx same map / reduce task utilization File Size is 250 MB-1 GB SSIS package takes care of the size Future testing: avro, protobuf 23 23
Hadoop / Auditing: Formats For ease of processing, replace carriage returns within embedded SQL statements, e. g. select col 1, col 2 from table. A to select col 1, col 2 from table. A This allows you to create a Hive table using CR as row delimiter (i. e. does not have things like SQL quoted identifiers) 24 24
25
BI Connectivity SQOOP, Hive. ODBC, Templeton, CSV, etc
Big Data … Excel-lerated! SSIS 2 Server, 3 mo 110 GB binary files SSIS extraction 1. 2 GB of text 120 MB gz Hadoop to Power. Pivot 6 MB 27
Power. Pivot workbook of Ho. A Audit data 28
Power View of Ho. A Audit Data 29
Win a Microsoft Surface Pro! Complete an online SESSION EVALUATION to be entered into the draw. Draw closes April 12, 11: 59 pm CT Winners will be announced on the PASS BA Conference website and on Twitter. Go to passbaconference. com/evals or follow the QR code link displayed on session signage throughout the conference venue. Your feedback is important and valuable. All feedback will be used to improve and select sessions for future events. 30
Diamond Sponsor Platinum Sponsor Thank you! April 10 -12, Chicago, IL
- Slides: 31