A multichannel architecture for highperformance NAND flashbased storage
A multi-channel architecture for high-performance NAND flash-based storage system Jeong-Uk Kang*, Jin-Soo Kim, Chanik Park, Hyoungjun Park, Joonwon Lee
Agenda • Introduce • Background • Multi-channel architecture – – – – Read Operation Write Operation Software Architecture Stripping Interleaving Pipelining Combined System • Evaluation • Conclusion • Appendix 2
Introduction (1/2) • Flash Memory의 처리속도 (K 9 LAG 08 U 0 M) – Write: Register 복사 시간(twc X 256) + program 시간(tprog) • 약 2. 4 MBps – Read: Page 읽는 시간(tread) + Register 복사 시간(trc X 256) • 약 28. 8 MBps 256 = 2048 Byte (page size) / 8 bit (bus width) 각 Flash 칩의 성능이 낮기 때문에 병렬화가 중요하다. 3
Introduction (2/2) • 그럼 어떻게? – Stripping #1 -A Request #3 Request #2 Request #1 #1 -B – Interleaving Request #3 #2 Request #1 #1 – Pipelining Request #3 Request #2 Request #1 4
Background • NAND Flash memory – Total size: 128 MB (1024 Blocks) – Block • Size: 64 pages – Page • Size: 2048 B with 64 B spare size • Operation – Write, Read, Erase • Features – – – Erase before write Program/erase cycle: 10, 000~1, 000 Read delay: 10~25 us Program delay: 200 ~ 700 us Erase delay: 2~3 ms Data transfer: 50 us (8 or 16 bit bus band width) 5
Multi-channel architecture DUMBO CTR CPU NOR FLASH INT SDRAM Host interface Host Channel Manager Data 32 Channel Manager INT CTR 32 DATA 32 Control Logic NAND Interface Buffer 1 Buffer 2 DATA 16 NAND Flash NAND Flash 6
Read Operation Interrupt Read Data (RD) HOST Read Set (RS) DUMBO Setup NAND Busy Data Transfer Read from NAND (RN) 7
Write Operation Write Data (WD) Interrupt Write Confirm (WC) HOST Write Set (WS) DUMBO Setup Data transfer NAND Program (NP) NAND BUSY Write to NAND (WN) 8
Software Architecture I/O Subsystem (Request Queue Management) KERNEL Block Device Driver Flash Translation Layer (FTL) Low-level Device Driver (I/O Scheduler, Interrupt Handler) DUMBO File System Data FTL method: Hybrid (구체적인 구현 방법에 대하여 기술 안 함) Garbage Collection: 없음 9
Stripping without stripping RN RS RD Channel Manager with stripping RN RS RD Channel Manager#1 RN RS RD Channel Manager#2 without stripping WD WS WN WC NP Channel Manager with stripping WDWS WN WC NP Channel Manager#1 WDWS WN WC NP Channel Manager#2 10
Interleaving without interleaving RS RN RD RN RS RD Channel Manager with interleaving Channel Manager#1 RS RN RD Channel Manager#2 without interleaving WDWS WN WC NP Channel Manager with interleaving Channel Manager#1 WDWS WN WC NP Channel Manager#2 11
Pipelining without pipelining RS RN RD Channel Manager with pipelining Buffer#1 Channel Manager RN RS Buffer#2 without pipelining RD WDWS WN WC NP Channel Manager with pipelining Buffer#1 Channel Manager Buffer#2 WD WS WN WC NP 12
Combined system #2 -1 #4 -1 #2 -2 Request #4 Request #2 Request #3 Request #1 #4 -2 #2 -1 #4 -1 #2 -2 #4 -2 Request Interleaving Striping Pipelining Example) Write: 2. 4 X 8 = 19. 2 MBps Read: 28. 8 X 8 = 230. 4 MBps 13
Evaluation (1/3) READ WRITE STRIPPING INTERLEAVING PIPELINIG ? 14
Evaluation (3/3) Block Device Driver 16
Appendix
Plane Parallelism (1/2) Sector Copy to Register Cell program 800 ms 93 ms 2 KB (Page Size) X twc (8 bit wired) -Atomic write를 보장하여야 함 (CPU Intensive operation) -전원이 꺼지면 데이터 삭제됨 -CPU와 상관없이 독립적으로 데이터 기록 -Cell기록이 완료되면 전원에 상관없이 데이터를 기록함 Register Write Speed: 35. 7 MB/S Actual Write Speed: 4. 3 MB/S 20
Plane Parallelism (2/2) Copy to Register Copy to Cell 1172 ms=13. 3 MB/S Copy to Register Copy to Cell Copy to Register 4 ch Copy to Register Copy to Cell Copy to Register Flash Copy to Cell Copy to Register Copy to Cell Copy to Register Copy to Cell Copy to Register Copy to Cell 13. 3 X 4 = 55. 4 MB/S 21
References • K 9 XXG 08 UXM Datasheet, Samsung Electronics 22
- Slides: 22