Mogo DB exploration Zhang Gang 20130613 Brief Introduction

Mogo. DB exploration Zhang Gang 2013/06/13

Brief Introduction to Mongo. DB �Mongo. DB is a document database that provides high performance, high availability, and easy scalability �features: �Document-Oriented Storage �JSON-style documents with dynamic schemas offer simplicity and power �Full index support �Replication&High Availability �Auto-sharding �Grid. FS �Map/reduce �Flexible aggregation and data processing 2/32 2021/2/24

Brief Introduction to Mongo. DB �BSON(Binary JSON) �BSON is a lightweight binary format capable of representing any Mongo. DB document as a string of bytes. The database understands BSON, and BSON is the format in which documents are saved to disk �Represent data efficiently, without using much extra space �Fast to encode to and decode 3/32 2021/2/24

Brief Introduction to Mongo. DB �Data files �Each database has a single. ns file and several data files, which have monotonically increasing numeric extensions, like foo. 0, foo. 1, foo. 2 �The numeric data files for a database will double in size for each new file �Mongo. DB also preallocates data files to ensure consistent performance. 4/32 2021/2/24

Brief Introduction to Mongo. DB �Memory-Mapped Storage Engine �It memory maps all its data files when a server start �The operating system manage it flushing data to disk and paging data in and out �Simple and fast 5/32 2021/2/24

Brief Introduction to Mongo. DB �Some concepts in Mongo. DB �I 6/32 2021/2/24

Brief Introduction to Mongo. DB �Some feature we are interested �Rich data model: as one of No. SQLs, it schema-less, very flexible �Easy to extend and high availability �One database between RDBMS and None-RDBMS �Function rich, powerful query language, can do most things that SQL can 7/32 2021/2/24

Data Model 8/32 2021/2/24

Data model in Mongo. DB �Data come from mysql tables directly �All the fields is peer. �Each records has the same fields. � In Mongo. DB, didn’t consider some special schema, e. g. embed documents �The data Mongo looks like a table-schema. 9/32 2021/2/24

Data model in Mongo. DB �One record looks like in Mongo 10/32 2021/2/24

Data model in Mongo. DB �Data size comparison �Mongo: about 14 GB �My. SQL: 5. 6 GB 11/32 2021/2/24

A simple test �Simple test: gengrate a plot �Use the basic query command “find”. �LHCb: about 5 s (should include network delay) �Badger 02: about 2. 6 s �So, explore detail analysis with Mongo 12/32 2021/2/24

Deploy a test cluster 13/32 2021/2/24

Replica sets �High availability �Replication ensures redundancy, backup, and automatic failover. �Replication occurs through replica sets. �Members in a set �Primary �Secondary �Arbiter �Secondary-only, hidden, delayed and Non-Voting 14/32 2021/2/24

Replica sets �Drivers know the primary. �Primary down, elect a new one from secondery. �Data is replicated after writing. �Typical three of a sets. �Write only to primary. �Read can read from secondery. 15/32 2021/2/24

Replica sets �A three members set. �Test auto-failover �Shut down the primary, after about 10 s, elect a new primary to response app. 16/32 2021/2/24

Sharding � High scalability �Sharding is Mongo. DB’s approach to scaling out. �Sharding automatically distributes collection data to the new server � Components in a sharding �Shards: � usually each shard is a replica sets. �Config servers � Each config server is a mongod instance that holds metadata about the cluster. �Mongos � route the reads and writes from applications to the shards, applications don’t access the cluster directly. 17/32 2021/2/24

Sharding 18/32 2021/2/24

Sharding �Deploy a sharding �Two shards: shard_1 at badger 01, shard_2 at badger 02. �Each shard is a single mongod instance. �Three config servers: two in badger 02, one in badger 01 �A mongos instance �Start a cluster 19/32 2021/2/24

Sharding �Config the cluster �Connect mongos �Add shard to cluster �Enable shard: shard data by sharding database and collection 20/32 2021/2/24

Sharding 21/32 2021/2/24

Data detail analysis 22/32 2021/2/24

Aggregation � Query with raw data � aggregation provides a powerful and flexible tools to use for data aggregation task �Map/reduce � Handle complex aggregation task �Aggregation Framework � Query that need not use map/reduce � Documents from a collection pass through an aggregation pipeline � A pipeline consists of several pipeline operators � $match � $group � $project � $sort �. . 23/32 2021/2/24

Aggregation SQL to Aggregation Framework Mapping. Chart 24/32 2021/2/24

Aggregation �One problem �The result of aggregation is a document �The size of a document in Mongo must less than 16 MB �Sulution : next version of Mongo will add a operator “$output” to deal with this �Use basic query command “find()” �Return a cursor �Then iterate the cursor to processing data �Use matplotlib to generate plots 25/32 2021/2/24

Raw Detail Analysis �Try to generate some plots �Job efficiency per user/site… �Number of successful jobs and failed jobs per user/site… �Disk. Space VS Exec. Time �. . �Time range: 1 year � 09. 01~10. 01 26/32 2021/2/24

Raw data analysis �Sample script �Indexes 27/32 2021/2/24

Raw Detail Analysis �CPU efficiency—per user 28/32 2021/2/24

Raw Detail Analysis �CPU efficiency-Job. Type: MC Simulation 29/32 2021/2/24

Detail Analysis �Job major status �Compare CHEP-2010 paper, time range about 3 months 30/32 2021/2/24

Detail Analysis �Exec. Time VS Disk. Space �Per user �Per site 31/32 2021/2/24

Thanks 32/32 2021/2/24