Disk Trie An Efficient Data Structure Using Flash

  • Slides: 22
Download presentation
Disk. Trie: An Efficient Data Structure Using Flash Memory for Mobile Devices N. M.

Disk. Trie: An Efficient Data Structure Using Flash Memory for Mobile Devices N. M. Mosharaf Kabir Chowdhury Md. Mostofa Akbar M. Kaykobad February 12, 2007 WALCOM '2007 1/22

Outline n n n n Problem statement Current status and motivation for a new

Outline n n n n Problem statement Current status and motivation for a new solution Preliminaries Disk. Trie Idea Results Limitations Future directions February 12, 2007 WALCOM '2007 2/22

Problem Statement n Let S be a static set of n unique finite strings

Problem Statement n Let S be a static set of n unique finite strings with the following operations: q q n Lookup (str) – check if the string str belong to the set S Prefix-Matching (P) – find all the elements in S that have the same prefix P The Problem: An efficient data structure that can operate in low-spec mobile devices and supports this definition February 12, 2007 WALCOM '2007 3/22

Current Status n At present, use of mobile devices and different sensor networks is

Current Status n At present, use of mobile devices and different sensor networks is increasing rapidly n Mobile devices and embedded systems are characterized by – q q q n Low processing power Low memory (both internal and external) Low power consumption Data structures and algorithms addressing these devices has huge application February 12, 2007 WALCOM '2007 4/22

Motivation for a New Solution n Use of external memory is necessary n Popular

Motivation for a New Solution n Use of external memory is necessary n Popular external memory data structures for computer include String B-tree, Hierarchy of indexes etc. n The problem is still not very well discussed in case of flash memory (Gal and Toledo) n Looking for a more space-efficient (both internal and external) data structure that is still competitive in terms of time efficiency February 12, 2007 WALCOM '2007 5/22

Flash Memory n Common memory that is extensively used in mobile/handheld devices n Unique

Flash Memory n Common memory that is extensively used in mobile/handheld devices n Unique read/write/erase behavior than other programmable memories n NOR flash memory supports random access and provides byte level addressing n NAND flash memory is faster and provides block level access February 12, 2007 WALCOM '2007 6/22

Trie n A trivial trie is an m-ary tree n Keys are stored in

Trie n A trivial trie is an m-ary tree n Keys are stored in the leaf level; each unique path from the root to a leaf corresponds to a unique key Leaf Nodes n Its search time can be considered as O(1) February 12, 2007 WALCOM '2007 Inner Nodes 7/22

Binary Trie and Path Compression Binary Trie Path-compressed Binary Trie n Binary encoding ensures

Binary Trie and Path Compression Binary Trie Path-compressed Binary Trie n Binary encoding ensures every node to have a maximum degree of two n Depth of the trie increases n Path-compression is used to reduce this February 12, 2007 WALCOM '2007 8/22

Patricia Trie & LPC Trie Patricia Trie Level and Path-compressed Trie n Patricia trie

Patricia Trie & LPC Trie Patricia Trie Level and Path-compressed Trie n Patricia trie is similar to path-compressed one but needs less memory n Finally, level and path-compressed trie reduces the depth but the trie itself does not remain binary anymore n Nilsson and Tikkanen has shown that an LPC trie has expected average depth of Θ(log*n) February 12, 2007 WALCOM '2007 9/22

Disk. Trie Idea n Static external memory implementation of the LPCtrie n Pre-build the

Disk. Trie Idea n Static external memory implementation of the LPCtrie n Pre-build the trie in a computer and then transfer it to flash memory n Three distinct phases – q q q Creation in computer Placement in flash memory Retrieval February 12, 2007 WALCOM '2007 10/22

Creation and Placement n All the strings are lexicographically sorted and placed contiguously in

Creation and Placement n All the strings are lexicographically sorted and placed contiguously in flash memory n Nodes of the Disk. Trie are placed separately from the strings and leaf nodes contain pointers to actual strings they represent n Page boundaries are always maintained in case of NAND memory n All the child nodes of a parent node are placed in sequence to reduce the number of pointers February 12, 2007 WALCOM '2007 11/22

Retrieval n Deals with two types of operations: q q Lookup Prefix-Matching n Lookup

Retrieval n Deals with two types of operations: q q Lookup Prefix-Matching n Lookup starts from the root and proceeds until the search string is exhausted n Each time a single node is retrieved from the disk in case of NOR flash memory and a whole block for NAND type February 12, 2007 WALCOM '2007 12/22

Lookup Algorithm procedure Lookup (str) { current. Node ← root while ( str is

Lookup Algorithm procedure Lookup (str) { current. Node ← root while ( str is not exhausted & current. Node is NOT a leaf node) - Select child. Node using str current. Node ← child. Node - end while - if ( error ) - return false - end if - return Compare. Strings (str, current. Node→str) } February 12, 2007 WALCOM '2007 13/22

Retrieval (Cont. ) n For Prefix-Matching operation, the searching takes place in two phases:

Retrieval (Cont. ) n For Prefix-Matching operation, the searching takes place in two phases: q q Identification of a prospective leaf node to find the longest common prefix Identification of the sub-trie or tries that contain the results February 12, 2007 WALCOM '2007 14/22

Illustration of the Prefix-Matching Operation (a) ‘P’ ends in a node February 12, 2007

Illustration of the Prefix-Matching Operation (a) ‘P’ ends in a node February 12, 2007 (b) ‘P’ ends in an arc WALCOM '2007 15/22

Prefix-Matching Algorithm procedure Prefix-Matching (P) { current. Node ← root while ( P is

Prefix-Matching Algorithm procedure Prefix-Matching (P) { current. Node ← root while ( P is not exhausted & current. Node is NOT a leaf node) - Select child. Node using str current. Node ← child. Node - end while - if ( error ) - return NULL - end if - l. Node ← left-most node in the probable region r. Node ← right-most node in the probable region - return all strings in the range } February 12, 2007 WALCOM '2007 16/22

Results - Storage Requirement n Disk. Trie needs two sets of components to be

Results - Storage Requirement n Disk. Trie needs two sets of components to be stored in the external memory: q q Actual Strings, and The data structure itself n Linear storage space to store all the key strings n A Patricia trie holding n strings has (2 n – 1) nodes n Hence, storage requirement for the total data structure is also linear n While storing the nodes, block boundaries must be maintained. It results in some wastage February 12, 2007 WALCOM '2007 17/22

Results (Cont. ) - Complexity of the Operations n Lookup q Fetch only those

Results (Cont. ) - Complexity of the Operations n Lookup q Fetch only those nodes from the disk that are on the path to the goal node q The number of disk accesses is bounded by the depth of the trie, which is in turn Θ(log*n). n log*n is the iterative logarithm function and defined as, q q q log*1 = 0 log*n = 1 + log*(ceil (log n)); for n > 1 Minimal internal memory required February 12, 2007 WALCOM '2007 18/22

Results (Cont. ) n Prefix-Matching q q Probable range of the strings starting with

Results (Cont. ) n Prefix-Matching q q Probable range of the strings starting with the same prefix is identified using methods similar to Lookup. It takes Θ(log*n) disk accesses In case of a successful search, it takes O(n/B) more disk accesses to retrieve the resultant strings if NAND memory is used (B is the block read size) Sorted placement of the strings saves a lot of string comparisons Internal memory requirement is minimal February 12, 2007 WALCOM '2007 19/22

Limitations n Wastage of space in each disk block while storing the Disk. Trie

Limitations n Wastage of space in each disk block while storing the Disk. Trie nodes n In some cases, same disk blocks are accessed more than once February 12, 2007 WALCOM '2007 20/22

Future Directions n More efficient storage management, specially removing the inherent wastage to maintain

Future Directions n More efficient storage management, specially removing the inherent wastage to maintain boundary property n Take advantage of spatial locality February 12, 2007 WALCOM '2007 21/22

February 12, 2007 WALCOM '2007 22/22

February 12, 2007 WALCOM '2007 22/22