A Deep Dive into the Pymongo Driver Joe

  • Slides: 45
Download presentation
A Deep Dive into the Pymongo Driver Joe Drumgoole Director of Developer Advocacy, EMEA

A Deep Dive into the Pymongo Driver Joe Drumgoole Director of Developer Advocacy, EMEA 21 -July-2016 V 1. 0

Mongo. DB Query Language (MQL) + Native Drivers Security Shared Clusters Replica Sets Wired

Mongo. DB Query Language (MQL) + Native Drivers Security Shared Clusters Replica Sets Wired Tiger 2 Management Mongo. DB Document/JSON Data Model MMAP In-memory Encrypted 3 rd party

Drivers and Frameworks MEAN Stack Morphia 3

Drivers and Frameworks MEAN Stack Morphia 3

BSON Side Bar • • 4 Mongo. DB uses a binary format of JSON

BSON Side Bar • • 4 Mongo. DB uses a binary format of JSON called BSON (Binary, j. SON) Adds type and size information Allows efficient parsing and skipping You can use Mongo. DB Drivers without every knowing that BSON exists Open standard (http: //bsonspec. org/, licensed under the Creative Commons) There are BSON libraries in every driver if you fancy trying it out Similar to google protocol buffers

Single Server Driver Mongod 5

Single Server Driver Mongod 5

Replica Set Driver Primary Secondary 6 Secondary

Replica Set Driver Primary Secondary 6 Secondary

Replica Set Primary Failure Driver Secondary 7 Secondary

Replica Set Primary Failure Driver Secondary 7 Secondary

Replica Set Election Driver Secondary 8 Secondary

Replica Set Election Driver Secondary 8 Secondary

Replica Set New Primary Driver Primary 9 Secondary

Replica Set New Primary Driver Primary 9 Secondary

Replica Set Recovery Driver Secondary Primary 10 Secondary

Replica Set Recovery Driver Secondary Primary 10 Secondary

Sharded Cluster Driver mongos Mongod 11 Mongod Mongod

Sharded Cluster Driver mongos Mongod 11 Mongod Mongod

Driver Responsibilities Driver Authentication & Security Python<->BSON Error handling & Recovery Wire Protocol Topology

Driver Responsibilities Driver Authentication & Security Python<->BSON Error handling & Recovery Wire Protocol Topology Management Connection Pool https: //github. com/mongodb/mongo-python-driver 12

Driver Responsibilities Driver Authentication & Security Python<->BSON Error handling & Recovery Wire Protocol Topology

Driver Responsibilities Driver Authentication & Security Python<->BSON Error handling & Recovery Wire Protocol Topology Management Connection Pool https: //github. com/mongodb/mongo-python-driver 13

Example API Calls import pymongo client = pymongo. Mongo. Client( host=“localhost”, port=27017) database =

Example API Calls import pymongo client = pymongo. Mongo. Client( host=“localhost”, port=27017) database = client[ ‘test_database’ ] collection = database[ ‘test_collection’ ] collection. insert_one({ "hello" : "world" , "goodbye" : "world" } ) collection. find_one( { "hello" : "world" } ) collection. update({ "hello" : "world" }, { "$set" : { "buenos dias" : "world" }} ) collection. delete_one({ "hello" : "world" } ) 14

Start Mongo. Client c = Mongo. Client( "host 1, host 2", replica. Set="replset" )

Start Mongo. Client c = Mongo. Client( "host 1, host 2", replica. Set="replset" ) 15

Client Side View Mongo Client Mongo. Client( "host 1, host 2", replica. Set="replset" )

Client Side View Mongo Client Mongo. Client( "host 1, host 2", replica. Set="replset" ) Primary host 1 Secondary host 2 16 Secondary host 3

Client Side View Mongo Client Primary host 1 Monitor Thread 2 Secondary host 2

Client Side View Mongo Client Primary host 1 Monitor Thread 2 Secondary host 2 { ismaster : False, secondary: True, hosts : [ host 1, host 2, host 3 ] } 17 Secondary host 3

What Does ismaster show? >>> pprint( db. command( "ismaster" )) {u'hosts': [u'JD 10 Gen-old.

What Does ismaster show? >>> pprint( db. command( "ismaster" )) {u'hosts': [u'JD 10 Gen-old. local: 27017', u'JD 10 Gen-old. local: 27018', u'JD 10 Gen-old. local: 27019'], u'ismaster' : False, u'secondary': True, u'set. Name' : u'replset', …} >>> 18

Topology Current Topology 19 ismaster New Topology

Topology Current Topology 19 ismaster New Topology

Client Side View Mongo Client Primary host 1 Monitor Thread 2 20 Secondary host

Client Side View Mongo Client Primary host 1 Monitor Thread 2 20 Secondary host 2 ✔ Secondary host 3

Client Side View Mongo Client Primary host 1 Monitor Thread 3 21 Monitor Thread

Client Side View Mongo Client Primary host 1 Monitor Thread 3 21 Monitor Thread 2 Secondary host 2 ✔ Secondary host 3

Client Side View Mongo Client Your Code Primary host 1 Monitor Thread 3 22

Client Side View Mongo Client Your Code Primary host 1 Monitor Thread 3 22 Monitor Thread 2 Secondary host 2 ✔ Secondary host 3

Next Is Insert c = Mongo. Client( "host 1, host 2", replica. Set="replset" )

Next Is Insert c = Mongo. Client( "host 1, host 2", replica. Set="replset" ) client. db. col. insert_one( { "a" : "b" } ) 23

Insert Will Block Mongo Client Insert Your Code Primary host 1 Monitor Thread 3

Insert Will Block Mongo Client Insert Your Code Primary host 1 Monitor Thread 3 24 Monitor Thread 2 Secondary host 2 ✔ Secondary host 3

ismaster response from Host 1 Mongo Client Insert Your Code ismaster Primary host 1

ismaster response from Host 1 Mongo Client Insert Your Code ismaster Primary host 1 Monitor Thread 3 25 Monitor Thread 2 Secondary host 2 ✔ Secondary host 3

Now Write Can Proceed Mongo Client Insert Your Code Primary host 1 ✔ Monitor

Now Write Can Proceed Mongo Client Insert Your Code Primary host 1 ✔ Monitor Thread 1 Monitor Thread 3 26 Monitor Thread 2 Secondary host 2 ✔ Secondary host 3

Later Host 3 Responds Mongo Client Your Code Primary host 1 ✔ Monitor Thread

Later Host 3 Responds Mongo Client Your Code Primary host 1 ✔ Monitor Thread 1 Monitor Thread 3 27 Monitor Thread 2 Secondary host 3 ✔ ✔

Steady State Mongo Client Your Code Primary host 1 ✔ Monitor Thread 1 Monitor

Steady State Mongo Client Your Code Primary host 1 ✔ Monitor Thread 1 Monitor Thread 3 28 Monitor Thread 2 Secondary host 3 ✔ ✔

Life Intervenes Mongo Client Your Code Primary host 1 ✖ Monitor Thread 1 Monitor

Life Intervenes Mongo Client Your Code Primary host 1 ✖ Monitor Thread 1 Monitor Thread 3 29 Monitor Thread 2 Secondary host 3 ✔ ✔

Monitor may not detect Mongo Client Insert Your Code Connection. Failure Primary host 1

Monitor may not detect Mongo Client Insert Your Code Connection. Failure Primary host 1 ✖ Monitor Thread 1 Monitor Thread 3 30 Monitor Thread 2 Secondary host 3 ✔ ✔

So Retry Mongo Client Insert Your Code ✖ Monitor Thread 1 Monitor Thread 3

So Retry Mongo Client Insert Your Code ✖ Monitor Thread 1 Monitor Thread 3 31 Monitor Thread 2 Secondary host 3 ✔ ✔

Check for Primary Mongo Client Insert Your Code ✖ Monitor Thread 1 Monitor Thread

Check for Primary Mongo Client Insert Your Code ✖ Monitor Thread 1 Monitor Thread 3 32 Monitor Thread 2 Secondary host 3 ✔ ✔

Host 2 Is Primary Mongo Client Insert Your Code ✖ Monitor Thread 1 Monitor

Host 2 Is Primary Mongo Client Insert Your Code ✖ Monitor Thread 1 Monitor Thread 3 33 Monitor Thread 2 Primary host 2 Secondary host 3 ✔ ✔

Steady State Mongo Client Your Code Primary host 1 ✔ Monitor Thread 1 Monitor

Steady State Mongo Client Your Code Primary host 1 ✔ Monitor Thread 1 Monitor Thread 3 34 Monitor Thread 2 Secondary host 3 ✔ ✔

What Does This Mean? - Connect import pymongo client = pymongo. Mongo. Client() try:

What Does This Mean? - Connect import pymongo client = pymongo. Mongo. Client() try: client. admin. command( "ismaster" ) except pymongo. errors. Connection. Failure, e : print( "Cannot connect: %s" % e ) 35

What Does This Mean? - Queries import pymongo def find_with_recovery( collection, query ) :

What Does This Mean? - Queries import pymongo def find_with_recovery( collection, query ) : try: return collection. find_one( query ) except pymongo. errors. Connection. Failure, e : logging. info( "Connection failure : %s" e ) return collection. find_one( query ) 36

What Does This Mean? - Inserts def insert_with_recovery( collection, doc ) : doc[ "_id"

What Does This Mean? - Inserts def insert_with_recovery( collection, doc ) : doc[ "_id" ] = Object. Id() try: collection. insert_one( doc ) except pymongo. errors. Connection. Failure, e: logging. info( "Connection error: %s" % e ) collection. insert_one( doc ) except Duplicate. Key. Error: pass 37

What Does This Mean? - Updates collection. update( { "_id" : 1 }, {

What Does This Mean? - Updates collection. update( { "_id" : 1 }, { "$inc" : { "counter" : 1 }}) 38

Configuration connect. Timeout. MS : 30 s server. Timeout. MS : 30 s 39

Configuration connect. Timeout. MS : 30 s server. Timeout. MS : 30 s 39

connect. Timeout. MS server. Timeout. MS Mongo Client Insert Your Code ✖ Monitor Thread

connect. Timeout. MS server. Timeout. MS Mongo Client Insert Your Code ✖ Monitor Thread 1 Monitor Thread 3 40 connect. Timeout. MS Monitor Thread 2 Secondary host 3 ✔ ✔

More Reading • The spec author Jess Jiryu Davis has a collection of links

More Reading • The spec author Jess Jiryu Davis has a collection of links and his better version of this talk https: //emptysqua. re/blog/server-discovery-and-monitoring-in-mongodbdrivers/ • The full server discovery and monitoring spec is on Git. Hub https: //github. com/mongodb/specifications/blob/master/source/serverdiscovery-and-monitoring/server-discovery-and-monitoring. rst 41

insert_one • Stages – Parse the parameters – Get a socket to write data

insert_one • Stages – Parse the parameters – Get a socket to write data on – Add the object Id – Convert the whole insert command parameters to a SON object – Apply the write. Concern to the command – Encode the message into a BSON object – Send the message to the server via the socket (TCP/IP) – Check for write. Errors (e. g. Duplicate. Key. Error) – Check for write. Concern. Errors (e. g. write. Timeout) – Return Result object 43

Bulk Insert bulker = collection. initialize_ordered_bulk_op() bulker. insert( { "a" : "b" } )

Bulk Insert bulker = collection. initialize_ordered_bulk_op() bulker. insert( { "a" : "b" } ) bulker. insert( { "c" : "d" } ) bulker. insert( { "e" : "f" } ) try: bulker. execute() except pymongo. errors. Bulk. Write. Error as e : print( "Bulk write error : %s" % e. detail ) 44

Bulk Write • • • Create Bulker object Accumulate operations Each operation is created

Bulk Write • • • Create Bulker object Accumulate operations Each operation is created as a SON object The operations are accumulated in a list Once execute is called – For ordered execute in order added – For unordered execute INSERT, UPDATEs then DELETE • Errors will abort the whole batch unless no write concern specified 45