Using Q 4 M a message queue storage

  • Slides: 53
Download presentation
Using Q 4 M a message queue storage engine for My. SQL Cybozu Labs,

Using Q 4 M a message queue storage engine for My. SQL Cybozu Labs, Inc. Kazuho Oku

Background Apr 22 2009 Using Q 4 M 2

Background Apr 22 2009 Using Q 4 M 2

Who am I? n. Name: Kazuho Oku (奥 一穂) n. Original Developer of Palmscape

Who am I? n. Name: Kazuho Oku (奥 一穂) n. Original Developer of Palmscape / Xiino n The oldest web browser for Palm OS n. Working at Cybozu Labs since 2005 n Research subsidiary of Cybozu, Inc. in Japan Apr 22 2009 Using Q 4 M 3

About Cybozu, Inc. n. Japan’s largest groupware vendor n Mostly provides as software products,

About Cybozu, Inc. n. Japan’s largest groupware vendor n Mostly provides as software products, not as services n Some of our apps bundle My. SQL as storage Apr 22 2009 Using Q 4 M 4

About Pathtraq n. Started in Aug. 2007 n. Web ranking service n One of

About Pathtraq n. Started in Aug. 2007 n. Web ranking service n One of Japan’s largest n like Alexa, but semi-realtime, and per-page n running on My. SQL n. Need for a fast and reliable message relay n for communication between the main server and content analysis server(s) Apr 22 2009 Using Q 4 M 5

Design Goals of Q 4 M n. Robust n Do not lose data on

Design Goals of Q 4 M n. Robust n Do not lose data on OS crash or power failure n. Fast n Transfer thousands of messages per second n. Easy to Use n Use SQL for access / maintenance n Integration into My. SQL nno more separate daemons to take care of Apr 22 2009 Using Q 4 M 6

What is a Message Queue? Apr 22 2009 Using Q 4 M 7

What is a Message Queue? Apr 22 2009 Using Q 4 M 7

What is a Message Queue? n. Middleware for persistent asynchronous communication n communicate between

What is a Message Queue? n. Middleware for persistent asynchronous communication n communicate between fixed pairs (parties) n a. k. a. Message Oriented Middleware n. MQ is intermediate storage n RDBMS is persistent storage n. Senders / receivers may go down Apr 22 2009 Using Q 4 M 8

Minimal Configuration of a MQ n. Senders and receivers access a single queue Sender

Minimal Configuration of a MQ n. Senders and receivers access a single queue Sender Receiver Queue Apr 22 2009 Using Q 4 M 9

MQ and Relays n. Separate queue for sender and receiver n. Messages relayed between

MQ and Relays n. Separate queue for sender and receiver n. Messages relayed between queues Relay Sender Queue Apr 22 2009 Receiver Queue Using Q 4 M 10

Merits of Message Relays n. Destination can be changed easily n Relays may transfer

Merits of Message Relays n. Destination can be changed easily n Relays may transfer messages to different locations depending on their headers n. Robustness against network failure n no loss or duplicates when the relay fails n. Logging and Multicasting, etc. Apr 22 2009 Using Q 4 M 11

Message Brokers n. Publish / subscribe model n Separation between components and their integration

Message Brokers n. Publish / subscribe model n Separation between components and their integration n Components read / write to predefined queues n Integration is definition of routing rules between the message queues n Messages are often transformed (filtered) within the relay agent Apr 22 2009 Using Q 4 M 12

What about Q 4 M? n. Q 4 M itself is a message queue

What about Q 4 M? n. Q 4 M itself is a message queue n. Can connect Q 4 M instances to create a message relay n. Provides API for creating message relays and brokers Apr 22 2009 Using Q 4 M 13

Performance of Q 4 M nover 7, 000 mess/sec. n message size: avg. 512

Performance of Q 4 M nover 7, 000 mess/sec. n message size: avg. 512 bytes n syncing to disk n. Outperforming most needs n if you need more, just scale out n Can coexist with other storage engines without sacrificing their performance see http: //labs. cybozu. co. jp/blog/kazuhoatwork/2008/06/q 4 m_06_release_and_benchmarks. php Apr 22 2009 Using Q 4 M 14

Applications of Q 4 M Apr 22 2009 Using Q 4 M 15

Applications of Q 4 M Apr 22 2009 Using Q 4 M 15

Asynchronous Updates n. Mixi (Japan's one of the largest SNS) uses Q 4 M

Asynchronous Updates n. Mixi (Japan's one of the largest SNS) uses Q 4 M to buffer writes to DB, to offload peak demands from http: //alpha. mixi. co. jp/blog/? p=272 Apr 22 2009 Using Q 4 M 16

Connecting Distant Servers n. Pathtraq uses Q 4 M to create a relay between

Connecting Distant Servers n. Pathtraq uses Q 4 M to create a relay between its database and content analysis processes → Contents to be analyzed → Pathtraq DB My. SQL conn. over SSL, gzip Content Analysis Processes ← Results of the analysis ← Apr 22 2009 Using Q 4 M 17

To Prefetch Data nlivedoor Reader (web-based feed aggregator) uses Q 4 M to prefetch

To Prefetch Data nlivedoor Reader (web-based feed aggregator) uses Q 4 M to prefetch data from database to memcached nuses Q 4 M for scheduling web crawlers as well from http: //d. hatena. ne. jp/mala/20081212/1229074359 Apr 22 2009 Using Q 4 M 18

Scheduling Web Crawlers n. Web crawlers with retry-on-error n. Sample code included in Q

Scheduling Web Crawlers n. Web crawlers with retry-on-error n. Sample code included in Q 4 M dist. If failed to fetch, store URL in retry queue Store Result Spiders Read URL DB Request Queue Retry Queue Rescheduler Apr 22 2009 Using Q 4 M 19

Delayed Content Generation n. Hatetter (RSS feed-to-twitter-API gateway) uses Q 4 M to delay

Delayed Content Generation n. Hatetter (RSS feed-to-twitter-API gateway) uses Q 4 M to delay content generation n Source code: github. com/yappo/website-hatetter Apr 22 2009 Using Q 4 M 20

User Notifications n. For sending notifications from web services DB App. Logic SMTP Agent

User Notifications n. For sending notifications from web services DB App. Logic SMTP Agent Queue(s) Apr 22 2009 Using Q 4 M IM Agent 21

Installing Q 4 M Apr 22 2009 Using Q 4 M 22

Installing Q 4 M Apr 22 2009 Using Q 4 M 22

Installing Q 4 M n. Compatible with My. SQL 5. 1 n. Download from

Installing Q 4 M n. Compatible with My. SQL 5. 1 n. Download from q 4 m. 31 tools. com n Binary releases available for some platforms n. Installing from source: n requires source code of My. SQL n. /configure && make install n run support-files/install. sql Apr 22 2009 Using Q 4 M 23

Configuration Options of Q 4 M n--with-sync=no|fsync|fdatasync|fcntl n Controls synchronization to disk n default:

Configuration Options of Q 4 M n--with-sync=no|fsync|fdatasync|fcntl n Controls synchronization to disk n default: fdatasync on linux n--enable-mmap n Mmap’ed reads lead to higher throughput n default: yes n--with-delete=pwrite|msync n msync recommended on linux>=2. 6. 20 if you need really high performance Apr 22 2009 Using Q 4 M 24

Q 4 M Basics Apr 22 2009 Using Q 4 M 25

Q 4 M Basics Apr 22 2009 Using Q 4 M 25

The Model n Various publishers write to queue n Set of subscribers consume the

The Model n Various publishers write to queue n Set of subscribers consume the entries in queue Publisher Q 4 M table Publisher Subscribers Publisher Apr 22 2009 Using Q 4 M 26

Creating a Q 4 M Table n ENGINE=QUEUE creates a Q 4 M table

Creating a Q 4 M Table n ENGINE=QUEUE creates a Q 4 M table n No primary keys or indexes n Sorted by insertion order (it’s a queue) Apr 22 2009 mysql> CREATE TABLE qt ( -> id int(10) unsigned NOT NULL, -> message varchar(255) NOT NULL -> ) ENGINE=QUEUE; Query OK, 0 rows affected (0. 42 sec) Using Q 4 M 27

Modifying Data on a Q 4 M Table n No restrictions for INSERT and

Modifying Data on a Q 4 M Table n No restrictions for INSERT and DELETE n No support for UPDATE mysql> INSERT INTO qt (id, message) -> VALUES -> (1, 'Hello'), -> (2, 'Bonjour'), -> (3, 'Hola'); Query OK, 3 rows affected (0. 02 sec) mysql> SELECT * FROM qt; +---------+ | id | message | +---------+ | 1 | Hello | | 2 | Bonjour | | 3 | Hola | +---------+ 3 rows in set (0. 00 sec) Apr 22 2009 Using Q 4 M 28

SELECT from a Q 4 M Table n Works the same as other storage

SELECT from a Q 4 M Table n Works the same as other storage engines n SELECT COUNT(*) is cached mysql> SELECT * FROM qt; +---------+ | id | message | +---------+ | 1 | Hello | | 2 | Bonjour | | 3 | Hola | +---------+ 3 rows in set (0. 00 sec) mysql> SELECT COUNT(*) FROM qt; +-----+ | COUNT(*) | +-----+ | 3 | +-----+ 1 row in set (0. 00 sec) How to subscribe to a queue? Apr 22 2009 Using Q 4 M 29

Calling queue_wait() n After calling, only one row becomes visible from the connection mysql>

Calling queue_wait() n After calling, only one row becomes visible from the connection mysql> SELECT * FROM qt; +---------+ | id | message | +---------+ | 1 | Hello | | 2 | Bonjour | | 3 | Hola | +---------+ 3 rows in set (0. 00 sec) mysql> SELECT queue_wait('qt'); +---------+ | queue_wait('qt') | +---------+ | 1 | +---------+ 1 row in set (0. 00 sec) mysql> SELECT * FROM qt; +---------+ | id | message | +---------+ | 1 | Hello | +---------+ 1 row in set (0. 00 sec) Apr 22 2009 Using Q 4 M 30

OWNER Mode and NON-OWNER Mode n. In OWNER mode, only the OWNED row is

OWNER Mode and NON-OWNER Mode n. In OWNER mode, only the OWNED row is visible n. OWNED row becomes invisible from other connections nrows of other storage engines are visible NON-OWNER Mode 1, 'Hello' 2, 'Bonjour' 3, 'Hola' Apr 22 2009 queue_wait() OWNER Mode 1, 'Hello' queue_end() queue_abort() Using Q 4 M 31

Returning to NON-OWNER mode n By calling queue_abort, the connection returns to NON-OWNER mode

Returning to NON-OWNER mode n By calling queue_abort, the connection returns to NON-OWNER mode mysql> SELECT QUEUE_ABORT(); +--------+ | QUEUE_ABORT() | +--------+ | 1 | +--------+ 1 row in set (0. 00 sec) mysql> SELECT * FROM qt; +---------+ | id | message | +---------+ | 1 | Hello | | 2 | Bonjour | | 3 | Hola | +---------+ 3 rows in set (0. 01 sec) Apr 22 2009 Using Q 4 M 32

Consuming a Row n By calling queue_end, the OWNED row is deleted, and connection

Consuming a Row n By calling queue_end, the OWNED row is deleted, and connection returns to NON-OWNER mode mysql> SELECT queue_wait('qt'); (snip) mysql> SELECT * FROM qt; +---------+ | id | message | +---------+ | 1 | Hello | +---------+ 1 row in set (0. 01 sec) mysql> SELECT queue_end(); +-------+ | queue_end() | +-------+ | 1 | +-------+ 1 row in set (0. 01 sec) mysql> SELECT * FROM qt; +---------+ | id | message | +---------+ | 2 | Bonjour | | 3 | Hola | +---------+ 2 rows in set (0. 00 sec) Apr 22 2009 Using Q 4 M 33

Writing a Subscriber n Call two functions: queue_wait, queue_end n Multiple subscribers can be

Writing a Subscriber n Call two functions: queue_wait, queue_end n Multiple subscribers can be run concurrently n each row in the queue is consumed only once while (true) { SELECT queue_wait('qt'); rows : = SELECT * FROM qt; if (count(rows) != 0) handle_row(rows[0]); SELECT queue_end(); } Apr 22 2009 # # # Using Q 4 M switch to owner mode obtain data if we have any data, then consume the row erase the row from queue 34

Writing a Subscriber (cont'd) n. Or call queue_wait as a condition n Warning: conflicts

Writing a Subscriber (cont'd) n. Or call queue_wait as a condition n Warning: conflicts with trigger-based insertions while (true) { rows : = SELECT * FROM qt WHERE queue_wait('qt'); if (count(rows) != 0) handle_row(rows[0]); SELECT queue_end(); } Apr 22 2009 Using Q 4 M 35

The Model – with code INSERT INTO queue. . . Publisher while (true) {

The Model – with code INSERT INTO queue. . . Publisher while (true) { rows : = SELECT * FROM qt WHERE queue_wait('qt'); if (count(rows) != 0) handle_row(rows[0]); SELECT queue_end(); } INSERT INTO queue. . . Publisher Q 4 M table Subscribers INSERT INTO queue. . . Publisher Apr 22 2009 Using Q 4 M 36

Three Functions in Detail Apr 22 2009 Using Q 4 M 37

Three Functions in Detail Apr 22 2009 Using Q 4 M 37

queue_wait(table) n. Enters OWNER mode n 0〜 1 row becomes OWNED n Enters OWNER

queue_wait(table) n. Enters OWNER mode n 0〜 1 row becomes OWNED n Enters OWNER mode even if no rows were available n Default timeout: 60 seconds n Returns 1 if a row is OWNED (0 on timeout) n. If called within OWNER mode, the owned row is deleted Apr 22 2009 Using Q 4 M 38

Revisiting Subscriber Code n. Calls to queue_end just before queue_wait can be omitted while

Revisiting Subscriber Code n. Calls to queue_end just before queue_wait can be omitted while (true) { rows : = SELECT * FROM qt WHERE queue_wait('qt'); if (count(rows) != 0) handle_row(rows[0]); SELECT queue_end(); } Apr 22 2009 Using Q 4 M 39

Conditional queue_wait() n. Consume rows of certain condition n Rows that do not match

Conditional queue_wait() n. Consume rows of certain condition n Rows that do not match will be left untouched n Only numeric columns can be checked n Fast - condition tested once per each row examples: SELECT queue_wait('table: (col_a*3)+col_b<col_c'); SELECT queue_wait('table: retry_count<5'); Apr 22 2009 Using Q 4 M 40

queue_wait(tbl_cond, [tbl_cond…, timeout]) n. Accepts multiple tables and timeout n. Data searched from leftmost

queue_wait(tbl_cond, [tbl_cond…, timeout]) n. Accepts multiple tables and timeout n. Data searched from leftmost table to right n. Returns table index (the leftmost table is 1) of the newly owned row n Returns zero if no rows are being owned example: SELECT queue_wait('table_A', 'table_B', 60); Apr 22 2009 Using Q 4 M 41

Functions for Exiting OWNER Mode nqueue_end n Deletes the owned row and exits OWNER

Functions for Exiting OWNER Mode nqueue_end n Deletes the owned row and exits OWNER mode nqueue_abort n Releases (instead of deleting) the owned row and exits OWNER mode n Close of a My. SQL connection does the same thing Apr 22 2009 Using Q 4 M 42

Relaying and Routing Messages Apr 22 2009 Using Q 4 M 43

Relaying and Routing Messages Apr 22 2009 Using Q 4 M 43

The Problem n. Relay (or router) consists of more than 3 processes, 2 conns

The Problem n. Relay (or router) consists of more than 3 processes, 2 conns n. No losses, no duplicates on crash or disconnection Q 4 M Table (source) Apr 22 2009 Relay Program Using Q 4 M Table (dest. ) 44

Internal Row ID n. Every row have a internal row ID n invisible from

Internal Row ID n. Every row have a internal row ID n invisible from Q 4 M table definition n monotonically increasing 64 -bit integer n. Used for detecting duplicates n Use two functions to skip duplicates n Data loss prevented by using queue_wait / queue_end Apr 22 2009 Using Q 4 M 45

queue_rowid() n. Returns row ID of the OWNED row (if any) n Returns NULL

queue_rowid() n. Returns row ID of the OWNED row (if any) n Returns NULL if no row is OWNED n. Call when retrieving data from source Apr 22 2009 Using Q 4 M 46

queue_set_srcid(src_tbl_id, mode, src_row_id) n. Call before inserting a row to destination table n Checks

queue_set_srcid(src_tbl_id, mode, src_row_id) n. Call before inserting a row to destination table n Checks if the row is already inserted into the table, and ignores next INSERT if true n. Parameters: n src_tbl_id - id to determine source table (0〜 63) n mode - "a" to drop duplicates, "w" to reset n src_row_id - row ID obtained from source table Apr 22 2009 Using Q 4 M 47

Pseudo Code n. Relays data from src_tbl to dest_tbl while (true) { # wait

Pseudo Code n. Relays data from src_tbl to dest_tbl while (true) { # wait for data SELECT queue_wait(src_tbl) => src_db; # read row and rowid row : = (SELECT * FROM src_tbl => src_db); rowid : = (SELECT queue_rowid() => src_db); # insert the row after setting srcid SELECT queue_set_srcid(src_tbl_id, 'a', rowid) => dest_db; INSERT INTO dest_tbl (row) => dest_db; } Apr 22 2009 Using Q 4 M 48

q 4 m-forward n. Simple forwarder script n installed into mysql-dir/bin usage: q 4

q 4 m-forward n. Simple forwarder script n installed into mysql-dir/bin usage: q 4 m-forward [options] src_addr dest_addr example: % support-files/q 4 m-forward "dbi: mysql: database=db 1; table=tbl 1; user=foo; password=XXX" "dbi: mysql: database=db 2; table=tbl 2; host=bar; user=foo" options: --reset duplicate check info. --sender=idx slot no. used for checking duplicates (0. . 63, default: 0) --help Apr 22 2009 Using Q 4 M 49

Limitations and the Future of Q 4 M Apr 22 2009 Using Q 4

Limitations and the Future of Q 4 M Apr 22 2009 Using Q 4 M 50

Things that Need to be Fixed n. Table compactions is a blocking operation n

Things that Need to be Fixed n. Table compactions is a blocking operation n runs when live data becomes <25% of log file n very bad, though not as bad as it seems nit's fast since it's a sequential write operation n. Relays are slow n since transfer is done row-by-row n. Binlog does not work n since MQ replication should be synchronous Apr 22 2009 Using Q 4 M 51

Future of Q 4 M n 2 -phase commit with other storage engines (maybe)

Future of Q 4 M n 2 -phase commit with other storage engines (maybe) n queue consumption and Inno. DB updates can become atomic operation Apr 22 2009 Using Q 4 M 52

Thank you http: //q 4 m. 31 tools. com/ Apr 22 2009 Using Q

Thank you http: //q 4 m. 31 tools. com/ Apr 22 2009 Using Q 4 M 53