ACID Atomicity Consistency Isolation Durability 6 ACID Atomicity
ACID › › Atomicity Consistency Isolation Durability 6
ACID › › Atomicity Consistency Isolation Durability 7
ACID › › Atomicity Consistency Isolation Durability 8
ACID › › Atomicity Consistency Isolation Durability 9
ACID › › Atomicity Consistency Isolation Durability 10
Isolation levels Anomalies Dirty read Unrepeatable read Lost updates Phantoms Write skew Read uncommitted Read committed Repeatable read Snapshot isolation Serializable Strict Serializable + Linearizable 11
Isolation levels Anomalies Dirty read Unrepeatable read Lost updates Read uncommitted Read committed Repeatable read Snapshot isolation Serializable Strict Serializable + Linearizable Phantoms Write skew
Isolation levels Anomalies Dirty read Unrepeatable read Lost updates Phantoms Write skew Read uncommitted Read committed Repeatable read Snapshot isolation Serializable Strict Serializable + Linearizable 13
Anomalous behavior ▎ Write skews anomalies are allowed by snapshot isolation level Transaction Saving Checking Read Analysis Transaction Read Account Saving + Checking > 0 Analysis Write Commit 14
Anomalous behavior ▎ Write skews anomalies are allowed by snapshot isolation level Transaction Saving Checking Read Analysis Saving + Checking > T 1 Transaction Read Account Saving + Checking > 0 Analysis Saving + Checking > T 2 Write Commit
Anomalous behavior ▎ Write skews anomalies are allowed by snapshot isolation level Transaction Saving Checking Read Analysis Transaction Read Account Analysis Saving + Checking > 0 Write Saving = Saving – T 1 Checking = Checking – T 2 Commit
Anomalous behavior ▎ Write skews anomalies are allowed by snapshot isolation level Transaction Saving Checking Read Analysis Transaction Read Account Analysis Saving + Checking < 0 Write Commit 17
Общий вид Shards Server Client Transaction context Begin Tx Query 1 Lock Query 2 Lock Commit ? Query 1 Query 2 Commit Tx 21
Таблицы Column 1 Column 2 Row 1 Row 2 … Row N … Column N Data. Shard 1 … Data. Shard N Primary key 23
Overview + Host Compute SS A H C + Host M BSC Table DS DS DS CPU MEM SM Storage DC DC Faildomain HDD SSD NVME Disk 26
YDB Tablet Таблицы App logic SM Log Tablet’s database Put Get Collect. Garbage Put Get Block Discover Collect. Garbage Distributed Storage 27
Количество связей Coordinator–Data. Shard. X A B Coordinator. X Data. Shard. Y С D Coordinator. Y Data. Shard. Z 35
Медиатор Data. Shard. W Mediator. X A B Data. Shard. X Coordinator. X Mediator. Y С D Data. Shard. Y Coordinator. Y Mediator. Z Data. Shard. Z 36
Разделить data и meta-информацию Tx. Body Tx. Meta A A Coordinators Data. Shard Mediators Data. Shard 37
Классы запросов 1. Read only, one shard — RO Immediate 2. Write only, one shard — WO Immediate 3. Read only / write only, multi shard — RO/WO 4. Read write, multi shard — RW 40
Транзакция RO immediate ▎ Итого › A 1 RTT Data. Shard 41
Транзакция WO immediate ▎ Итого › › A 1 RTT 1 write Data. Shard Persist Result 42
Транзакция RO / WO Tx. Body A › › 1 RTT 1 write Data. Shard Persist Tx. Body 43
Транзакция RO / WO Tx. Body Tx. Meta A Coordinators Mediators Data. Shard Persist Plan Data. Shard › › 1 RTT › › 0. 5 RTT 1 write 1 plan batch time 1 write 0. 5 RTT Persist Tx. Body 44
Транзакция RO / WO Tx. Body Tx. Meta A Coordinators Mediators Data. Shard Persist Plan Data. Shard Persist Tx. Body План Persist Plan. Step Data › › 1 RTT › › 0. 5 RTT › › › 0. 5 RTT 1 write 1 plan batch time 1 write 0. 5 RTT ▎ Итого › 6 RTT + 1 plan batch time + 3 disk. IO 45
Транзакция RW Tx. Body › › A 1 RTT 1 write Read Data. Shard Persist Tx. Body Write Data. Shard Persist Tx. Body 47
Транзакция RW Tx. Body Tx. Meta A Coordinators Mediators Persist Plan Read Data. Shard › › 1 RTT › › 0. 5 RTT 1 write 1 plan batch time 1 write 0. 5 RTT Persist Tx. Body Write Data. Shard Persist Tx. Body 48
Транзакция RW Tx. Body Tx. Meta A Coordinators Mediators Persist Plan Read Data. Shard 1 RTT › › 0. 5 RTT 1 write 1 plan batch time 1 write 0. 5 RTT Persist Tx. Body Persist Plan. Step Out read set План › › Write Data. Shard 1 write Persist Tx. Body Persist Plan. Step 49
Транзакция RW Tx. Body Tx. Meta A Coordinators Mediators Persist Plan Read Data. Shard › › 1 RTT › › 0. 5 RTT › › › 0. 5 RTT 1 write 1 plan batch time 1 write 0. 5 RTT Persist Tx. Body Persist Plan. Step Out read set План Write Data. Shard Persist Tx. Body Persist Plan. Step Out read set Persist Result 1 write 0. 5 RTT ▎ Итого › 7. 5 RTT + 1 plan batch time + 4 disk. IO
Overview 1. Read only, one shard — RO Immediate — 1 RTT 2. Write only, one shard — WO Immediate — 2 RTT + 1 disk. IO 3. Read only / write only, multi shard — RO/WO — 6 RTT + 1 plan batch time + 3 disk. IO 4. Read write, multi shard — RW — 7. 5 RTT + 1 plan batch time + 4 disk. IO 51
Общий вид Shards Server Client Transaction context Begin Tx Query 1 Lock Query 2 Lock Commit ? Query 1 Query 2 Commit Tx 53
YQL-транзакции AWX BWXYZ Coordinators Mediators DSW Optimistic lock План DSW План Optimistic lock DSX Optimistic lock DSY DSZ
YQL-транзакции AWX BWXYZ Coordinators Mediators CW DSW Optimistic lock План DSW План Optimistic lock DSX Optimistic lock DSY DSZ
Уровень изоляции ▎ Serializable — default isolation level › Координируемые и immediate-транзакции ▎ Strict serializable — maximum isolation level › Все транзакции координируемы 58
What if two-phase commit › › A Leader Data. Shard Persist transaction Data. Shard 0. 5 RTT 1 write
What if two-phase commit A Leader Data. Shard Persist transaction Persist First phase Data. Shard Persist locks › › 0. 5 RTT › › › 1 RTT 1 write
What if two-phase commit A Leader Data. Shard Persist transaction Persist First phase Persist Second phase Data. Shard Persist locks Remove locks › › 0. 5 RTT › › › 1 RTT 1 write 1 write 0. 5 RTT ▎ Итого › 8 RTT + 5 disk. IO
- Slides: 65