Apache Spark Apache Zeppelin Enterprise Security for production

  • Slides: 33
Download presentation
Apache Spark & Apache Zeppelin: Enterprise Security for production deployments Vinay Shukla Director, Product

Apache Spark & Apache Zeppelin: Enterprise Security for production deployments Vinay Shukla Director, Product Management Dec 8, 2016 Twitter: @neomythos

Thank You 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Thank You 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

whoami Recovering Programmer, Product Management à Spark for 2. 5 + years, Hadoop for

whoami Recovering Programmer, Product Management à Spark for 2. 5 + years, Hadoop for 3+ years à Blog at www. vinayshukla. com à Twitter: @neomythos à Addicted to Yoga, Hiking, & Coffee à Smallest contributor to Apache Zeppelin 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

What are the enterprise security requirements? à Spark user should be authenticated à Integrate

What are the enterprise security requirements? à Spark user should be authenticated à Integrate with corporate LDAP/AD à Allow only authorized users access à Audit all access à Protect data both in motion & at rest à Easily manage all security à Make security easy to manage à … 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Security: Rings of Defense Perimeter Level Security • Network Security (i. e. Firewalls) Data

Security: Rings of Defense Perimeter Level Security • Network Security (i. e. Firewalls) Data Protection • • • Wire encryption HDFS TDE/DARE Others Authentication • • Kerberos Knox (Other Gateways) Authorization • • Apache Ranger/Sentry HDFS Permissions HDFS ACLs YARN ACL OS Security 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Page 5

Interacting with Spark Zeppelin Driver Spark Thrift. Driver Server Ex Spark on YARN Spark.

Interacting with Spark Zeppelin Driver Spark Thrift. Driver Server Ex Spark on YARN Spark. Shell. Driver 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Ex REST Server Driver

Context: Spark Deployment Modes • Spark on YARN – Spark driver (Spark. Context) in

Context: Spark Deployment Modes • Spark on YARN – Spark driver (Spark. Context) in YARN AM(yarn-cluster) – Spark driver (Spark. Context) in local (yarn-client): • Spark Shell & Spark Thrift Server runs in yarn-client only Client Spark Driver App Master Client Spark Driver Executor YARN-Client 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved App Master Executor YARN-Cluster

Spark on YARN RM 2 1 Spark AM 3 4 Spark Submit John Doe

Spark on YARN RM 2 1 Spark AM 3 4 Spark Submit John Doe Hadoop Cluster HDFS Executor 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Node Manager

Spark – Security – Four Pillars à Authentication à Authorization à Audit à Encryption

Spark – Security – Four Pillars à Authentication à Authorization à Audit à Encryption Ensure network is secure 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Spark leverages Kerberos on YARN

Authenticate users with AD/LDAP kinit YARN launches Spark Executors using John Doe’s identity 1

Authenticate users with AD/LDAP kinit YARN launches Spark Executors using John Doe’s identity 1 6 3 John Doe Use Spark ST, submit Spark Job 7 Spark AM 5 NN 2 4 Get service ticket for Spark, Spark gets Namenode (NN) service ticket KDC 10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved AD/LD AP Executor reads from HDFS using John Doe’s delegation token Hadoop Cluster

Spark – Kerberos - Example kinit -kt /etc/security/keytabs/johndoe. keytab johndoe@EXAMPLE. COM. /bin/spark-submit --class org.

Spark – Kerberos - Example kinit -kt /etc/security/keytabs/johndoe. keytab johndoe@EXAMPLE. COM. /bin/spark-submit --class org. apache. spark. examples. Spark. Pi --master yarn-cluster -num-executors 3 --driver-memory 512 m --executor-cores 1 lib/spark-examples*. jar 10 11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Allow only authorized users access to Spark jobs Ranger/Se ntry Can John launch this

Allow only authorized users access to Spark jobs Ranger/Se ntry Can John launch this job? Executors read from HDFS Use Spark ST, submit Spark Job YARN Cluster John Doe Client gets service ticket for Spark Get Namenode (NN) service ticket KDC 12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Can John read this file A B HDFS C

Secure data in motion: Wire Encryption with Spark Submit Shuffle Data Shuffle Service Control/RPC

Secure data in motion: Wire Encryption with Spark Submit Shuffle Data Shuffle Service Control/RPC RM AM Driver 13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved NM Data Source Shuffle Block. Transfer Read/Write Data Ex 1 Ex N FS – Broadcast, File Download

Spark Communication Encryption Settings Shuffle Data NM > Ex leverages YARN based SSL Control/RPC

Spark Communication Encryption Settings Shuffle Data NM > Ex leverages YARN based SSL Control/RPC spark. authenticate = true. Leverage YARN to distribute keys Shuffle Block. Transfer spark. authenticate. enable. Sasl. Encryption= true Read/Write Data Depends on Data Source, For HDFS RPC (RC 4 | 3 DES) or SSL for Web. HDFS FS – Broadcast, File Download 14 spark. ssl. enabled = true © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Sharp Edges with Spark Security à Spark. SQL – Only coarse grain access control

Sharp Edges with Spark Security à Spark. SQL – Only coarse grain access control today à Client -> Spark Thrift Server > Spark Executors – No identity propagation on 2 nd hop – Lowers security, forces STS to run as Hive user to read all data – Use Spark. SQL via shell or programmatic API – https: //issues. apache. org/jira/browse/SPARK-5159 à Spark Stream + Kafka + Kerberos – No SSL support yet à Spark Shuffle > Only SASL, no SSL support à Spark Shuffle > No encryption for spill to disk or intermediate data 15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Spark. SQL: Fine grained security 16 © Hortonworks Inc. 2011 – 2016. All Rights

Spark. SQL: Fine grained security 16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Key Features: Spark Column Security with LLAP Ã Fine-Grained Column Level Access Control for

Key Features: Spark Column Security with LLAP Ã Fine-Grained Column Level Access Control for Spark. SQL. Ã Fully dynamic policies per user. Doesn’t require views. Ã Use Standard Ranger policies and tools to control access and masking policies. Flow: 1. Spark. SQL gets data locations known as “splits” from Hive. Server and plans query. 2. Hive. Server 2 authorizes access using Ranger. Per-user policies like row filtering are applied. 3. Spark gets a modified query plan based on dynamic security policy. 4. Spark reads data from LLAP. Filtering / masking guaranteed by LLAP server. 17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Hive. Server 2 1 3 Authorization Spark Client Hive Metastore Data Locations View Definitions 4 LLAP Data Read Filter Pushdown 2 Ranger Server Dynamic Policies

Example: Per-User Row Filtering by Region in Spark. SQL Original Query: SELECT * from

Example: Per-User Row Filtering by Region in Spark. SQL Original Query: SELECT * from CUSTOMERS WHERE total_spend > 10000 Spark User 1 (West Region) Query Rewrites based on Dynamic Ranger Policies Dynamic Rewrite: SELECT * from CUSTOMERS WHERE total_spend > 10000 AND region = “west” Spark User 2 (East Region) Dynamic Rewrite: SELECT * from CUSTOMERS WHERE total_spend > 10000 AND region = “east” LLAP Data Access 18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved User ID Region Total Spend 1 East 5, 131 2 East 27, 828 3 West 55, 493 4 West 7, 193 5 East 18, 193 Fine grained Security to Spark. SQL http: //bit. ly/2 b. Lgh. Gz http: //bit. ly/2 b. TX 7 Pm

Apache Zeppelin Security 19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Apache Zeppelin Security 19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Apache Zeppelin: Authentication + SSL 3 1 SSL Ex Spark on YARN John Doe

Apache Zeppelin: Authentication + SSL 3 1 SSL Ex Spark on YARN John Doe 2 Firewall 20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved LDAP Ex

Security in Apache Zeppelin? Zeppelin leverages Apache Shiro for authentication/authorization 21 © Hortonworks Inc.

Security in Apache Zeppelin? Zeppelin leverages Apache Shiro for authentication/authorization 21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Example Shiro. ini # ============ # Shiro INI configuration # ============ [main] ## LDAP/AD

Example Shiro. ini # ============ # Shiro INI configuration # ============ [main] ## LDAP/AD configuration [users] # The 'users' section is for simple deployments # when you only need a small number of statically-defined # set of User accounts. [urls] # The 'urls' section is used for url-based security # 22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Edit with Ambari or your favorite text editor

Apache Zeppelin: AD Authentication à Configure Zeppelin to use AD [main] active. Directory. Realm

Apache Zeppelin: AD Authentication à Configure Zeppelin to use AD [main] active. Directory. Realm = org. apache. zeppelin. server. Active. Directory. Group. Realm active. Directory. Realm. system. Username = XXXXX active. Directory. Realm. system. Password = XXXXXXXXX active. Directory. Realm. search. Base = DC=hdpqa, DC=Example, DC=com active. Directory. Realm. url = ldap: //hdpqa. example. com: 389 active. Directory. Realm. principal. Suffix = @hdpqa. example. com active. Directory. Realm. group. Roles. Map = "CN=hdpdv_admin, DC=hdpqa, DC=example, DC=com": "admin" active. Directory. Realm. authorization. Caching. Enabled = true session. Manager = org. apache. shiro. web. session. mgt. Default. Web. Session. Manager cache. Manager = org. apache. shiro. cache. Memory. Constrained. Cache. Manager security. Manager. cache. Manager = $cache. Manager security. Manager. session. Manager = $session. Manager security. Manager. session. Manager. global. Session. Timeout = 86400000 shiro. login. Url = /api/login 23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Apache Zeppelin: LDAP Authentication à Configure Zeppelin to use LDAP [main] ldap. Realm =

Apache Zeppelin: LDAP Authentication à Configure Zeppelin to use LDAP [main] ldap. Realm = org. apache. zeppelin. server. Ldap. Group. Realm ldap. Realm = org. apache. shiro. realm. ldap. Jndi. Ldap. Realm ldap. Realm. context. Factory. environment[ldap. search. Base] = DC=hdpqa, DC=example, DC=com ldap. Realm. user. Dn. Template = uid={0}, OU=Accounts, DC=hdpqa, DC=example, DC=com ldap. Realm. context. Factory. url = ldaps: //hdpqa. example. com: 636 ldap. Realm. context. Factory. authentication. Mechanism = SIMPLE session. Manager = org. apache. shiro. web. session. mgt. Default. Web. Session. Manager security. Manager. session. Manager = $session. Manager # 86, 400, 000 milliseconds = 24 hour security. Manager. session. Manager. global. Session. Timeout = 86400000 shiro. login. Url = /api/login 24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Don’t want passwords in clear in shiro. ini? Ã Create an entry for AD

Don’t want passwords in clear in shiro. ini? Ã Create an entry for AD credential –Zeppelin leverages Hadoop Credential API –hadoop credential create active. Directory. Realm. system. Password -provider jceks: ///etc/zeppelin/conf/credentials. jceks Enter password: Enter password again: active. Directory. Realm. system. Password has been successfully created. org. apache. hadoop. security. alias. Java. Key. Store. Provider has been updated. Ø Make credentials. jceks only Zeppelin user readable Ø chmod 400 with only Zeppelin process r/w access, no other user allowed access to Credentials Ø Edit shiro. in Ø active. Directory. Realm. system. Password -provider jceks: //etc/zeppelin/conf/credentials. jceks 25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Want to connect to LDAP over SSL? Ã Change protocol to ldaps in shiro.

Want to connect to LDAP over SSL? Ã Change protocol to ldaps in shiro. ini ldap. Realm. context. Factory. url = ldaps: //hdpqa. example. com: 636 Ã If LDAP is using self signed certificate, import the certificate into truststore of JVM running Zeppelin echo -n | openssl s_client –connect ldap. example. com: 389 | sed -ne '/-BEGIN CERTIFICATE-/, /-END CERTIFICATE-/p' > /tmp/examplecert. crt keytool –import -keystore $JAVA_HOME/jre/lib/security/cacerts -storepass changeit -noprompt -alias mycert -file /tmp/examplecert. crt 26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Zeppelin + Livy E 2 E Security Ispark Group Interpreter Livy APIs Spark Livy

Zeppelin + Livy E 2 E Security Ispark Group Interpreter Livy APIs Spark Livy SPNego: Kerberos John Doe Zeppelin LDAP/LDAPS LDAP 27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Job runs as John Doe Kerberos/RPC Yarn

Apache Zeppelin: Authorization in Zeppelin à Note level authorization à Grant Permissions (Owner, Reader,

Apache Zeppelin: Authorization in Zeppelin à Note level authorization à Grant Permissions (Owner, Reader, Writer) to users/groups on Notes à LDAP Group integration à Zeppelin UI Authorization à Allow only admins to configure interpreter à Configured in shiro. ini [urls] /api/interpreter/** = authc, roles[admin] /api/configurations/** = authc, roles[admin] /api/credential/** = authc, roles[admin] 28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Authorization at Data Level à For Spark with Zeppelin > Livy > Spark – Identity Propagation Jobs run as End-User à For Hive with Zeppelin > JDBC interpreter à Shell Interpreter – Runs as end-user

Map admin role to AD Group à Allows mapped AD group access to Configure

Map admin role to AD Group à Allows mapped AD group access to Configure Interpreters [main] active. Directory. Realm = org. apache. zeppelin. server. Active. Directory. Group. Realm active. Directory. Realm. system. Username = XXXXX active. Directory. Realm. system. Password = XXXXXXXXX active. Directory. Realm. search. Base = DC=hdpqa, DC=Example, DC=com active. Directory. Realm. url = ldap: //hdpqa. example. com: 389 active. Directory. Realm. principal. Suffix = @hdpqa. example. com active. Directory. Realm. group. Roles. Map = "CN=hdpdv_admin, DC=hdpqa, DC=example, DC=com": "admin" active. Directory. Realm. authorization. Caching. Enabled = true session. Manager = org. apache. shiro. web. session. mgt. Default. Web. Session. Manager cache. Manager = org. apache. shiro. cache. Memory. Constrained. Cache. Manager security. Manager. cache. Manager = $cache. Manager security. Manager. session. Manager = $session. Manager security. Manager. session. Manager. global. Session. Timeout = 86400000 29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

User reports: Can’t see interpreter Page à Zeppelin has URL based access control enabled

User reports: Can’t see interpreter Page à Zeppelin has URL based access control enabled à User does not have the role Or Role incorrectly mapped [main] active. Directory. Realm = org. apache. zeppelin. server. Active. Directory. Group. Realm active. Directory. Realm. system. Username = XXXXX active. Directory. Realm. system. Password = XXXXXXXXX active. Directory. Realm. search. Base = DC=hdpqa, DC=Example, DC=com active. Directory. Realm. url = ldap: //hdpqa. example. com: 389 active. Directory. Realm. principal. Suffix = @hdpqa. example. com active. Directory. Realm. group. Roles. Map = "CN=hdpdv_admin, DC=hdpqa, DC=example, DC=com": "admin" active. Directory. Realm. authorization. Caching. Enabled = true session. Manager = org. apache. shiro. web. session. mgt. Default. Web. Session. Manager cache. Manager = org. apache. shiro. cache. Memory. Constrained. Cache. Manager security. Manager. cache. Manager = $cache. Manager security. Manager. session. Manager = $session. Manager security. Manager. session. Manager. global. Session. Timeout = 86400000 © Hortonworks Inc. 2011 – 2016. All Rights Reserved 30

User reports: Livy interpreter fails to run with access error à Ensure Livy has

User reports: Livy interpreter fails to run with access error à Ensure Livy has ability to proxy user Edit HDFS core-site. xml via Ambari: <property> <name>hadoop. proxyuser. livy_qa. groups</name> <value>*</value> </property> <name>hadoop. proxyuser. livy_qa. hosts</name> <value>*</value> </property> à Ensure Livy has Impersonation enabled In /etc/livy/conf/livy. conf livy. impersonation. enabled true 31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Apache Zeppelin: Credentials in Zeppelin à LDAP/AD account à Zeppelin leverages Hadoop Credential API

Apache Zeppelin: Credentials in Zeppelin à LDAP/AD account à Zeppelin leverages Hadoop Credential API à Interpreter Credentials à Not solved yet 32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved This is still an open issue

Thank You Vinay Shukla @neomythos 33 © Hortonworks Inc. 2011 – 2016. All Rights

Thank You Vinay Shukla @neomythos 33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved