CPSC 335 Computer Science University of Calgary Canada

Outline r Coalesced Hashing r Variants r Brent’s Method r Binary Tree r Comparison

Coalesced Hashing Ø Coalesced hashing is a collision resolution method that uses pointers to

Coalesced Hashing Ø Coalesced hashing obtains its name from what occurs when we attempt

Coalesced Hashing Ø In figure to the right, the records with keys X, D,

Coalesced Hashing Algorithm for Coalesced Hashing Coalesced hashing originated with Williams [1] and is

Variants Ø Many suggestions have been made for reducing the coalescing of probe chains

Variants Ø Coalescing may be reduced by modifying the table organization. Ø Instead of

Variants Ø For a fixed amount of storage, as the address factor decreases, the

Variants LISCH Ø The algorithm given in slide 6 is called Late Insertion Standard

Variants Ø Another way of varying the insertion algorithm Changing the way in which

Variants Table 1: Mean number of probes for successful lookup (n = 997) for

Brent’s Method Ø Dynamic collision resolution methods are methods in which an item once

Brent’s Method Ø The Primary Probe Chain of a record is the sequence of

Brent’s Method Ø Brent’s method is the first of several dynamic collision resolution methods.

Brent’s Method § The subscript i gives the number of probes needed to retrieve

Brent’s Method § Let s be the number of probes required to retrieve an

Coalesced Hashing Algorithm for insertion into a file using Brent’s method 18

Binary Tree Ø A question that is often asked when considering Brent’s collision resolution

Binary Tree Ø Binary tree collision resolution method uses a binary tree structure to

Binary Tree Ø The Binary decision tree is generated in a breadth first fashion

Comparison Table 2: Comparison of Mean number of probes for successful lookup (n =

Comparison Figure 5 graphically displays the performance data for all methods except for computed

Comparison § It can be noticed the wide variance in performance at packing factors

Comparison Table 3: Search, relocation and storage comparisons The above table offers additional useful

Comparison Table 3: Search, relocation and storage comparisons • The range for worst case

Comparison Ø What is the best method? There is no single method that is

Comparison Table 4: Advantages, disadvantages, and when to use various collision resolution methods 28

Slides: 28

Download presentation

CPSC 335 Computer Science University of Calgary Canada

Outline r Coalesced Hashing r Variants r Brent’s Method r Binary Tree r Comparison of various methods 2

Coalesced Hashing Ø Coalesced hashing is a collision resolution method that uses pointers to connect the elements of a synonym chain. • A hybrid of separate chaining and open addressing. • Linked lists within the hash table handle collisions. • This strategy is effective, efficient and very easy to implement. 3

Coalesced Hashing Ø Coalesced hashing obtains its name from what occurs when we attempt to insert a record with a home address that is already occupied by a record from a chain with a different home address. This situation would occur, for example, if we attempted to insert a record with a home address of s into the hash table. What occurs is that the two chains with records having different home addresses coalesce or grow together. 4

Coalesced Hashing Ø In figure to the right, the records with keys X, D, and Y were inserted in the given order into the hash table. A, B, C, and D form one set of synonyms and X and Y form another set. Ø When X is inserted into the table with coalescing, it must be inserted as the end of the chain that it is coalescing with. Instead of needing only one probe to retrieve X, three are needed. The greater the coalescing the longer he probe chain will be, and as a result, retrieval performance will be degraded. Ø When record D is now added, it must be inserted at the end of the coalesced chains; we must move over record X from the other chain then to locate D. Synonym chain: with coalescing (The shaded portion indicates portion of the chain in which coalescing has occurred, the thin line represents the insertions on the synonym chain with r as its home address. The thick line represents the insertions on the chain with s as its home address. ) 5

Coalesced Hashing Algorithm for Coalesced Hashing Coalesced hashing originated with Williams [1] and is also referred to as direct chaining. 6

Variants Ø Many suggestions have been made for reducing the coalescing of probe chains and thereby lowering the number of retrieval probes which in turn improves performance. The variants may be classified in three ways: • The table organization (whether or not a separate overflow area is used). • The manner of linking a colliding item into a chain. • The manner of choosing unoccupied locations. 7

Variants Ø Coalescing may be reduced by modifying the table organization. Ø Instead of allocating the entire table space for both overflow records and home address records, the table is divided into a primary area and a overflow area. • The primary area is the address space that the hash function maps into. • The overflow or cellar area contains only overflow records. Primary Overflow (cellar) • The address factor is the ratio of primary area to the total table size – Address Factor = primary area / total table size 8

Variants Ø For a fixed amount of storage, as the address factor decreases, the cellar size increases, which reduces the coalescing but because the primary area becomes smaller, it increases the number of collisions. Ø More collisions mean more items requiring multiple retrieval probes. Ø Vitter [2] determined that an address factor of 0. 86 yields nearly optimal retrieval performance for most load factors. 9

Variants LISCH Ø The algorithm given in slide 6 is called Late Insertion Standard Coalesced Hashing (LISCH) since new records are inserted at the end of a probe chain. [ The ‘Standard’ in the name refers to the lack of a cellar. Ø The variant of that algorithm that uses a cellar is called LICH, Late Insertion Coalesced Hashing. 10

Variants Ø Another way of varying the insertion algorithm Changing the way in which we choose a unoccupied location. The unoccupied locations are always chosen from the bottom of the storage area. But the no. of collisions is increased in this way. Ø Hsaio [3] suggest REISCH (‘R’ stands for ‘Random’), in which a random unoccupied location for the new insertion is chosen. REISCH gives only 1% improvement over EISCH. Ø BLISCH (‘B’ signifies ‘Bidirectional’) is another method of choosing the overflow location for a collision insertion is to alternate the selection between the top and bottom of the table. Ø In DCWC (Direct Chaining Without Coalescing), a record not stored at its home address is moved. 11

Variants Table 1: Mean number of probes for successful lookup (n = 997) for variants of Coalesced Hashing 12

Brent’s Method Ø Dynamic collision resolution methods are methods in which an item once stored may be moved. With these methods, any item may be moved, not only those records which are not stored at their home addresses. Ø These methods require additional processing when inserting a record into the table but reduce the number of probes needed for retrieval. Ø The justification for this additional processing is that we usually insert an tem into a table only once but retrieve it many times. 13

Brent’s Method Ø The Primary Probe Chain of a record is the sequence of locations visited during the insertion or retrieval of the record. Ø The sequence of positions visited when attempting to move a record from the primary probe chain is called the Secondary Probe Chain. Ø We want to minimize the total number of probes for both the item being inserted and the items already in the table. This strategy assumes an equal likelihood of any of the items being retrieved. 14

Brent’s Method Ø Brent’s method is the first of several dynamic collision resolution methods. In each of them, moving a previously stored tem to achieve a reduction in the retrieval probes is considered. § The solid vertical line represents the primary probe chain. § The horizontal lines represent the secondary probe chain. § The q value along the primary probe chain is the increment for the item being inserted whereas the qi’s along the secondary probe chains represent the increments associated with the item being moved. Brent’s method, probe chains, and their order of processing 15

Brent’s Method § The subscript i gives the number of probes needed to retrieve the item being inserted along its primary probe chain. § The subscript j gives the number of additional probes needed to retrieve the item being moved along its secondary probe chain. § To minimize the number of retrieval probes, (i+j) is minimized. In the case of i=j, we will arbitrarily choose to minimize on i. Brent’s method, probe chains, and their order of processing § When we can no longer achieve a reduction in the no. of retrieval probes, we should terminate the process of attempting to move an item. 16

Brent’s Method § Let s be the number of probes required to retrieve an item if nothing is moved. We then try all combinations of (i+j) < s such that we minimize (i+j). On equality, since there would be no reduction in the number of probes, no movement would occur. Brent’s method, probe chains, and their order of processing 17

Coalesced Hashing Algorithm for insertion into a file using Brent’s method 18

Binary Tree Ø A question that is often asked when considering Brent’s collision resolution method is, “If it is a good idea to move an item on a primary probe chain, why not carry this concept one step further and move items from secondary and subsequent probe chains? ” Ø Two features of the binary tree collision resolution method make it worth considering: • It needs fewer retrieval probes than Brent’s method. • Perhaps more importantly, it illustrates the importance of choosing an appropriate data structure in order to be able to solve a problem effectively. 19

Binary Tree Ø Binary tree collision resolution method uses a binary tree structure to determine when to move an item and where to move it. A binary tree is appropriate since there are essentially two choices at each probable storage address – continue to the next address along the probe chain of the item being inserted or move the item stored at that address to the next position on its probe chain. A left branch in the binary tree signifies the continue option and a right branch the move option. 20

Binary Tree Ø The Binary decision tree is generated in a breadth first fashion from the top down left to right a shown: § The binary tree is used only as a control mechanism in deciding where to store an item and is not used for string records. § A different binary tree is constructed for each insertion of a record. § By moving items from secondary and subsequent probe chains, a placement of records that will further reduce the average number of retrieval probes when compared with Brent’s method is achieved. Binary decision tree 21

Comparison Table 2: Comparison of Mean number of probes for successful lookup (n = 997; = packing factor) Table 2 provides the average number of retrieval probes for successful searches on a table of 997 records with a uniform distribution of keys. 22

Comparison Figure 5 graphically displays the performance data for all methods except for computed chaining with a 2 -bit link field. Performance of collision resolution methods 23

Comparison § It can be noticed the wide variance in performance at packing factors >= 90 percent. § The result of computed chaining with a 20 percent packing factor s less than that for DCWC (Direct Chaining Without Coalescing). Performance of collision resolution methods 24

Comparison Table 3: Search, relocation and storage comparisons The above table offers additional useful comparison criteria. The successful search criteria give the minimum and maximum number of probes necessary to retrieve an item. 25

Comparison Table 3: Search, relocation and storage comparisons • The range for worst case performance varies from ln n to n. • Although the worst case performance for locating a record with both LISCH and computed chaining is n, their typical performances would be better, because only records of one chain need to be searched. 26

Comparison Ø What is the best method? There is no single method that is the best for all purposes. q The method that provides the lowest average number of probes, and thus the best performance, in general, is DCWC. q The method with the second lowest average number of retrieval probes is computed chaining. q Without coalescing, LISCH is DCWC and does perform better than computed chaining. q If storage s somewhat scarce, computed chaining will then have an advantage over DCWC. 27

Comparison Table 4: Advantages, disadvantages, and when to use various collision resolution methods 28