An Efficient External Sorting Algorithm for Flash Memory






![Flash Memory Array [1] Tyler Cossentine - M. Sc. Thesis Defense 6 Flash Memory Array [1] Tyler Cossentine - M. Sc. Thesis Defense 6](https://slidetodoc.com/presentation_image_h/2120c9b9ba3c694a180264c1e9f7dc1e/image-7.jpg)
![Flash Memory Block Diagram [1] Tyler Cossentine - M. Sc. Thesis Defense 7 Flash Memory Block Diagram [1] Tyler Cossentine - M. Sc. Thesis Defense 7](https://slidetodoc.com/presentation_image_h/2120c9b9ba3c694a180264c1e9f7dc1e/image-8.jpg)



![Previous Work Heap Sort • A heap sort algorithm, called FAST(1) [7], uses a Previous Work Heap Sort • A heap sort algorithm, called FAST(1) [7], uses a](https://slidetodoc.com/presentation_image_h/2120c9b9ba3c694a180264c1e9f7dc1e/image-12.jpg)
![Previous Work External Merge Sort • The external merge sort [5] algorithm is the Previous Work External Merge Sort • The external merge sort [5] algorithm is the](https://slidetodoc.com/presentation_image_h/2120c9b9ba3c694a180264c1e9f7dc1e/image-13.jpg)

![Flash Min. Sort Overview • Flash Min. Sort [3] uses low-cost random reads to Flash Min. Sort Overview • Flash Min. Sort [3] uses low-cost random reads to](https://slidetodoc.com/presentation_image_h/2120c9b9ba3c694a180264c1e9f7dc1e/image-15.jpg)
















![References [1] Atmel Flash AT 45 DB 161 D Data Sheet, 2010. [2] N. References [1] Atmel Flash AT 45 DB 161 D Data Sheet, 2010. [2] N.](https://slidetodoc.com/presentation_image_h/2120c9b9ba3c694a180264c1e9f7dc1e/image-32.jpg)
![References [7] H. Park and K. Shim. FAST: Flash-Aware External Sorting for Mobile Database References [7] H. Park and K. Shim. FAST: Flash-Aware External Sorting for Mobile Database](https://slidetodoc.com/presentation_image_h/2120c9b9ba3c694a180264c1e9f7dc1e/image-33.jpg)
- Slides: 33

An Efficient External Sorting Algorithm for Flash Memory Embedded Devices Tyler Cossentine - M. Sc. Thesis Defense

Overview • Introduction • Previous work • Flash Min. Sort • Experimental Results • Conclusions Tyler Cossentine - M. Sc. Thesis Defense 1

Introduction • Embedded systems are devices that perform a few simple functions. • Embedded devices typically have limited power, memory and computational resources. • Many embedded systems applications involve storing and querying large datasets. • Sorting algorithms are commonly used in query processing. Tyler Cossentine - M. Sc. Thesis Defense 2

Embedded Devices • Not designed to be general purpose devices. o Wireless sensor networks, smart cards, etc. • Can communicate with other devices through wired or wireless interfaces. • Hardware constraints: o o Battery powered Low-power microcontroller Limited memory (as little as a 1 k. B) Small amount of local storage (Flash or EEPROM) Tyler Cossentine - M. Sc. Thesis Defense 3

Sensor Networks • Sensor networks are used in military, environmental, agricultural and industrial applications. • A wireless sensor node contains a microcontroller, sensing system, local storage, battery and wireless radio. • Devices may process data locally or send it to a common collection point (sink) for processing. • On-device data storage and query processing has the potential to reduce communication and energy use [6][8]. Tyler Cossentine - M. Sc. Thesis Defense 4

Flash Memory • A type of EEPROM o o Available in higher capacities Organized as pages of data A page is erased before it is written Erase unit is typically a block of pages • Two types: NOR and NAND o NOR memory supports byte-level reads o NAND requires error-correcting code (ECC) • Unique performance characteristics o Asymmetric read and write costs (10 -100 times faster reads) o Low-cost random reads o Memory wear Tyler Cossentine - M. Sc. Thesis Defense 5
![Flash Memory Array 1 Tyler Cossentine M Sc Thesis Defense 6 Flash Memory Array [1] Tyler Cossentine - M. Sc. Thesis Defense 6](https://slidetodoc.com/presentation_image_h/2120c9b9ba3c694a180264c1e9f7dc1e/image-7.jpg)
Flash Memory Array [1] Tyler Cossentine - M. Sc. Thesis Defense 6
![Flash Memory Block Diagram 1 Tyler Cossentine M Sc Thesis Defense 7 Flash Memory Block Diagram [1] Tyler Cossentine - M. Sc. Thesis Defense 7](https://slidetodoc.com/presentation_image_h/2120c9b9ba3c694a180264c1e9f7dc1e/image-8.jpg)
Flash Memory Block Diagram [1] Tyler Cossentine - M. Sc. Thesis Defense 7

Relation Tyler Cossentine - M. Sc. Thesis Defense 8

Sorting Algorithms • Sorting is a fundamental class of algorithms because it allows for efficient ordering of results, joins, grouping and aggregation. • An in-place sort can be performed when the entire dataset fits into memory: o Merge sort o Quicksort • External sorting: o Use external memory (hard disk) to sort the dataset o External merge sort is the standard in databases Tyler Cossentine - M. Sc. Thesis Defense 9

Previous Work One Key Scan • The most memory efficient external sorting algorithm is one key scan [2]. o Performs D+1 scans, where D is the #of distinct sort key values. o Keeps track of: • current is the sort key value that is being output in this scan. • split is the next smallest sort key value encountered. • The algorithm needs an initial scan to determine the values of current and split. o Requires enough memory to store two sort key values. Tyler Cossentine - M. Sc. Thesis Defense 10
![Previous Work Heap Sort A heap sort algorithm called FAST1 7 uses a Previous Work Heap Sort • A heap sort algorithm, called FAST(1) [7], uses a](https://slidetodoc.com/presentation_image_h/2120c9b9ba3c694a180264c1e9f7dc1e/image-12.jpg)
Previous Work Heap Sort • A heap sort algorithm, called FAST(1) [7], uses a binary heap of size N tuples to store the next smallest tuples encountered during a scan. o Performs T/N scans, where T is the # of tuples and N is the number of tuples that fit into memory o Requires enough memory to store a tuple o May be slower than one key scan if there are few distinct sort key values, the tuple size is large or the dataset is large. Tyler Cossentine - M. Sc. Thesis Defense 11
![Previous Work External Merge Sort The external merge sort 5 algorithm is the Previous Work External Merge Sort • The external merge sort [5] algorithm is the](https://slidetodoc.com/presentation_image_h/2120c9b9ba3c694a180264c1e9f7dc1e/image-13.jpg)
Previous Work External Merge Sort • The external merge sort [5] algorithm is the standard sorting algorithm used in databases. o An initial read pass constructs sorted sub lists the size of the amount of RAM allocated to the operator. o The merge phase can consist of multiple passes. o Each pass buffers one page from each of the sub lists, performs a merge and writes a temporary result to flash. o The algorithm requires at least three pages of memory. Tyler Cossentine - M. Sc. Thesis Defense 12

Previous Work Summary • External merge sort requires writing and a significant amount of memory that makes it non-executable in certain embedded applications. • Existing sorting algorithms for datasets stored in flash memory favor reads over writes. • Existing sorting algorithms do not take advantage of lowcost random reads. • Performance depends on the properties of the input dataset. • Data collected in applications such as sensor networks is often clustered spatially and temporally. Tyler Cossentine - M. Sc. Thesis Defense 13
![Flash Min Sort Overview Flash Min Sort 3 uses lowcost random reads to Flash Min. Sort Overview • Flash Min. Sort [3] uses low-cost random reads to](https://slidetodoc.com/presentation_image_h/2120c9b9ba3c694a180264c1e9f7dc1e/image-15.jpg)
Flash Min. Sort Overview • Flash Min. Sort [3] uses low-cost random reads to retrieve only required pages during a scan of the relation. • It builds a dynamic index over the relation that stores the minimum value in each region. • A region represents one or more pages of data. • The algorithm maintains a current minimum value and next minimum value. • During a pass, only pages located in a region that has a minimum value equal to the current minimum are read. Tyler Cossentine - M. Sc. Thesis Defense 14

Flash Min. Sort Overview • The algorithm keeps track of the next smallest value in a region as it is being read (next. Idx). • After a region has been read, its minimum value in the index is updated. • Adapts to the size of the input relation and caches pages when given additional memory. Tyler Cossentine - M. Sc. Thesis Defense 15

Dataset Page 1 2 3 4 5 6 7 8 9 10 11 12 1 9 9 8 6 4 2 1 2 6 9 8 Data 9 9 8 9 8 7 6 6 4 3 1 2 1 1 3 4 7 8 8 9 9 9 Flash Min. Sort Index 1 9 9 7 5 2 1 1 5 9 8 9 Example Min x 1 9 9 8 7 5 2 x 1 2 6 8 9 ∞ Output #1 Scan Min index Find 1 in region #1 Search page #1 Output tuple #1 next = 9, next. Idx = 4 Output #2 Output tuple #4 Region Min set to 9 Output #3 Find 1 in region #7 Search page #7 Output tuple #2 next = 2, next. Idx = 4 Output #4 Output tuple #4 Region Min set to 2 Output #5 Find 1 in region #8 Search page #8 Output tuple #1 next = ∞, next. Idx = 2 Output #6 Output tuple #2 next = ∞, next. Idx = 3 Output #7 Output tuple #3 next = ∞, next. Idx = 4 Output 1 (from pg. 1, tuple 1) 1 (from pg. 1, tuple 4) 1 (from pg. 7, tuple 2) 1 (from pg. 7, tuple 4) 1 (from pg. 8, tuple 1) 1 (from pg. 8, tuple 2) 1 (from pg. 8, tuple 3) 1 (from pg. 8, tuple 4) 2 (from pg. 6, tuple 4) 2 (from pg. 7, tuple 1) 2 (from pg. 7, tuple 3) . . 9 1 2 1 Page Buffer Tyler Cossentine - M. Sc. Thesis Defense 16

Flash Min. Sort Performance • In the ideal case, each region represents a single page. • The amount of memory required to store the minimum value of each page is LK * P, where LK is the size of the sort key and P is the number of pages. • If there is not enough memory, each region represents two or more adjacent pages. • The minimum amount of memory required is 4*LK for two regions. Tyler Cossentine - M. Sc. Thesis Defense 17

Flash Min. Sort Direct Reads • If the flash chip supports direct byte reads, Flash Min. Sort is even more efficient as it only needs to read the sort key values. • Performance: o P = # of pages, T = # of tuples, NP = # of pages in a region o DR = average # of distinct values in a region, R = # of regions o LK = size of key in bytes, LT = size of tuple in bytes Tyler Cossentine - M. Sc. Thesis Defense 18

Flash Min. Sort Comparison • Considering only page reads Flash Min. Sort is: o Faster than one key sort in all cases. o Faster than heap sort unless input size is only a small multiple of the memory size (e. g. 2 to 5). o Faster than external merge sort for a large spectrum of the possible configurations even while using less memory and performing no writes. Algorithm Page I/Os Flash Min. Sort P * (1 + DR) One Key Sort P * (1 + D) Heap Sort P * (T * LT ) / M # scans based on # tuples External Merge Sort (two pass) P * (2 + X) Tyler Cossentine - M. Sc. Thesis Defense Notes Perform scan for each distinct key X is write-to-read ratio as algorithm must write as an intermediate step Two pass is not likely for small memory sizes 19

Experimental Evaluation • Experimental evaluation compares: Flash Min. Sort, one key sort, heap sort, and external merge sort. • 2 k. B of memory available to operators • Sensor node hardware: o o Atmel Mega 644 p (8 MHz) 4 KB SRAM 2 MB Atmel AT 45 DB 161 D serial flash (512 byte page size) Node design was used for field measurement of soil moisture for use with an automated irrigation controller [4]. • Dataset: o Three months of the live soil sensing data and generated ordered and random data sets. The real data set has 10, 000 records (160 KB) and 43 distinct values. o Record size is 16 bytes. Sort key is a 2 byte integer. Tyler Cossentine - M. Sc. Thesis Defense 20

Raw Device Performance • Time to read 50, 000 tuples: 5. 3 seconds • Time to write 50, 000 tuples: 23 seconds • Write-to-read ratio: 4. 7 • Time to scan 50, 000 sort keys: 2. 1 seconds • Notes: o Buffering a page in processor memory is more efficient than using on chip buffers due to bus communication and latency. o Bus speeds affect write-to-read ratio. Even though writing is considerably slower on the chip, this was masked due to the speed of the processor and bus. Tyler Cossentine - M. Sc. Thesis Defense 21

Real Data • Heap sort is not shown as time is order of magnitudes longer: o 100 bytes (5 tuple): 10, 000 passes, 3, 377 seconds o 1200 bytes (74 tuples): 302 seconds • Min. Sort. DR is a direct read version of Min. Sort. • External merge: 1536 bytes (3 pages): 7 passes, 76 seconds Tyler Cossentine - M. Sc. Thesis Defense 22

Random Data • Data set with 10, 000 records and 500 distinct values (1 to 500). • Heap sort performs the same number of passes regardless of the data set (random, real, or ordered). • External merge sort took 78 seconds as the sorting during initial run generation took slightly more time. Tyler Cossentine - M. Sc. Thesis Defense 23

Ordered Data • Sorted, real data set with 10, 000 tuples and 43 distinct values. • Min. Sort did not detect sorted regions but still gets a benefit by detecting duplicates of the same value in a region. • External merge sort took 75 seconds. Tyler Cossentine - M. Sc. Thesis Defense 24

Results Summary • Min. Sort is faster than one key sort and heap sort with or without using direct byte reads from the device. o Especially good for sensor data that exhibits temporal clustering. o Min. Sort is a generalization of one key sort, and performance of both algorithms depends on the number of distinct values. • Heap sort is not competitive for small memory sizes. o The ratio of available RAM versus dataset size is key. Tyler Cossentine - M. Sc. Thesis Defense 25

Results Summary • External merge sort performs well, but requires at least three pages (1, 536 bytes) of memory. o For the real data set on this platform, external merge sort will never be faster assuming at least two passes. o For wireless sensing applications, dealing with the additional space and wear leveling complicates system design and performance. Tyler Cossentine - M. Sc. Thesis Defense 26

Solid State Drives Experimental Setup • Solid state drives (SSD) have sophisticated controllers that support wear leveling, address translation and buffer management. • Test system: o AMD Operton 2. 1 GHz o 32 GB DDR 3 o Intel X 25 SSD (1. 6 write-to-read ratio) • Data: o 5, 000 tuples (80 MB) o 16 B tuples Tyler Cossentine - M. Sc. Thesis Defense 27

Solid State Drives Real Data 43 distinct sort key values Tyler Cossentine - M. Sc. Thesis Defense 28

Solid State Drives Random Data 500 distinct sort key values Tyler Cossentine - M. Sc. Thesis Defense 29

Conclusion • Flash Min. Sort is a sorting algorithm designed for datasets stored in flash memory on computationally constrained embedded devices. • Its performance is better than existing algorithms by exploiting low-cost random reads. • Depending on the properties of the dataset, Flash Min. Sort can outperform External Merge Sort on SSDs. Tyler Cossentine - M. Sc. Thesis Defense 30
![References 1 Atmel Flash AT 45 DB 161 D Data Sheet 2010 2 N References [1] Atmel Flash AT 45 DB 161 D Data Sheet, 2010. [2] N.](https://slidetodoc.com/presentation_image_h/2120c9b9ba3c694a180264c1e9f7dc1e/image-32.jpg)
References [1] Atmel Flash AT 45 DB 161 D Data Sheet, 2010. [2] N. Anciaux, L. Bouganim, and P. Pucheral. Memory Requirements for Query Execution in Highly Constrained Devices. In VLDB, pages 694– 705, 2003. [3] T. Cossentine and R. Lawrence. Fast Sorting on Flash Memory Sensor Nodes. In IDEAS 2010, pages 105– 113, 2010. [4] S. Fazackerley and R. Lawrence. Reducing Turfgrass Water Consumption Using Sensor Nodes and an Adaptive Irrigation controller. In Sensors Applications Symposium, Limerick, Ireland, 2010. [5] H. Garcia-Molina, J. D. Ullman, and J. Widom. Database Systems: The Complete Book. Prentice Hall Press, Upper Saddle River, NJ, USA, 1 edition, 2002. [6] G. Mathur, P. Desnoyers, D. Ganesan, and P. Shenoy. Ultra-Low Power Data Storage for Sensor Networks. In Proceedings of the 5 th international conference on Information processing in sensor networks, IPSN ’ 06, pages 374– 381, New York, NY, USA, 2006. ACM. Tyler Cossentine - M. Sc. Thesis Defense 31
![References 7 H Park and K Shim FAST FlashAware External Sorting for Mobile Database References [7] H. Park and K. Shim. FAST: Flash-Aware External Sorting for Mobile Database](https://slidetodoc.com/presentation_image_h/2120c9b9ba3c694a180264c1e9f7dc1e/image-33.jpg)
References [7] H. Park and K. Shim. FAST: Flash-Aware External Sorting for Mobile Database Systems. Journal of Systems and Software, 82(8): 1298 – 1312, 2009. [8] G. J. Pottie and W. J. Kaiser. Wireless Integrated Network Sensors. Communications of the ACM, 43: 51– 58, May 2000. Tyler Cossentine - M. Sc. Thesis Defense 32