Data Lake Architecture and Implementation Guobiao Mo China
Data. Lake Architecture and Implementation Guobiao Mo (China Mobile) Participation Team: China Mobile and QCT April 2, 2019
Architecture Overview ONAP Components DMaa. P/Kafka DL Admin (UI) Maria. DB Data. Lake Feeder Data. Lake JSON/XML TSDB (Prometheus, Open. TSDB) OLAP Store (Druid, Click. House) Document Store (Mongo. DB, Couchbase) OSS/BSS Search Engine (Elasticsearch) Hadoop (HDFS, HBase …) Others External Data Storage Grafana Superset (OLAP) Query/UI Kibana Third Party Tools Custom Apps (Spark, …) 2
Data Storage Type Product Analytics Tool Target Document Store Couchbase Mongo. DB Query UI Document storage and retrieval Time Series DB Prometheus Open. TSDB Grafana Monitoring Metrics Real-Time analytics OLAP Store Druid Kylin Click. House Superset Interactive OLAP Time series analytics Search Engine Elastic. Search Solr Kibana Grafana Full-text search Interactive analytics Hadoop HDFS HBase Spark App Hive Mass batch data processing Custom applications Others Based on future demands 3
Data. Lake Admin UI Architecture Frontend (Bootstrap 4 + Angular 7) Backend (Flask + Python 3) REST API Server Data. Lake Feeder REST API Data. Lake Admin UI 4
Data. Lake Admin UI Features • Feeder settings • DB connections • Topic specific configurations • Enabled/disabled • Which DBs to store the data • Authentication key • ID extraction • Data mapping • Data format (JSON, XML, …) • TTL • Sava raw data • correlate. Cleared. Message Parent topic configurations act as default. • Feeder operation • Start/stop Feeder • Show Feeder status and statistics • Pre-configured 3 rd Party Tools dash boards and query scripts 5
Data. Lake Feeder Architecture DMaa. P/Kafka Druid Mongo. DB Service Mongo. DB Couchbase Service Couchbase Elasticsearch Service Elasticsearch Store Service Pull Service Maria. DB Druid's Kafka indexing service ORM Feeder Controller Topic Controller REST API XXX Service XXX Feeder Admin UI 6
Data. Lake Feeder Features • Read data directly from Kafka for performance • Support for pluggable databases • Support REST API for inter-component communicatios • Use Maria. DB to store settings • Make use of some database ingestion facility (Druid's Kafka indexing service) • Support for pluggable data processing features • Connect to Kafka and DBs through TLS • Implemented in Spring boot 7
Integration with DCAE • Data. Lake Feeder and Admin delivered as services under DCAE https: //wiki. onap. org/display/DW/Data. Lake+POC 8
Links • Data. Lake Proposal https: //wiki. onap. org/display/DW/Data. Lake • Data. Lake Development Environment Setup https: //wiki. onap. org/display/DW/Data. Lake+Development+Environment+Setup • Source Code https: //gerrit. onap. org/r/gitweb? p=dcaegen 2/services. git; a=tree; f=components/datalakehandler; h=6 b 0 d 4 ad 9 fe 266722794 edd 497 dfe 6749 e 7 e 9 eb 62; hb=HEAD 9
Thank You 10
- Slides: 10