IOEfficient Spatial Data Structures Observations on the dDimensional
I/O-Efficient Spatial Data Structures: Observations on the d-Dimensional Grid File Stuart A. Mac. Gillivray and Bradford G. Nickerson Faculty of Computer science, University of New Brunswick, Fredericton, New Brunswick, Canada Structure Definition and Motivations § The Grid File is a linear-space structure for storing multidimensional point data § § § § on a disk, allowing I/O-efficient search. Points are stored on the disk in cell-blocks of fixed size. Subdirectories stored on the disk in fixed blocks contain pointers to these cellblocks, and linear scales describing their extent. Main memory M contains pointers to subdirectories and coarser linear scales. Retrieval of a given point is thus possible in two disc accesses: one to retrieve the appropriate subdirectory, and one to retrieve the block of points. Cells and subdirectories are spatially determined. Partitioning takes place dynamically as points are added. Primary motivation: Storage of large amounts of data (e. g. millions to billions of data points) and retrieval of data with minimal I/O operations Assumptions for tests: Disk page size of 4 k. B, two-dimensional 32 -bit indexing, 24 bytes per point. B = 900 blocks per subdirectory, C=170 points per block. R 1 y linear scale R 2 Test data and map courtesy of USGS Navigation Boomer Survey, http: //quashnet. er. usgs. gov/data/1999/99023/navigation/boomer/index. htm Range Queries and Limitations § Range searches require few disk accesses; worst case scenario, i. e. R 2, would § § § require 2 d times as many disk accesses as necessary for retrieval of a single point. Range search is possible in O(2 d + K/B) I/Os. Main directory is limited by constraints of main memory. Assuming main memory M of 4 GB and test assumptions, a grid file could index a maximum of 1. 53 x 1014 points, around 150 terabytes of data. Other limitations include the structure itself; as currently written, divisions between blocks and subdirectories are determined as points are added to the file. Poorly distributed points may result in an inefficient structure. x linear scale Theoretical Extensions § Extensible with additional layers of subdirectories, i. e. k layers. § Every additional layer increases the number of disk accesses needed to retrieve a point by 1, and increases the capacity by a factor of B § k layers store up to N = Bk. CM points, e. g. k=4 => N=1. 2 x 1020 § Sufficient extension in this regard gives logarithmic time for point and range queries N_ § Single point retrieval: 1+k disk accesses, k at least log. B( CM ) References: Jürg Nievergelt, Hans Hinterberger, and Kenneth C. Sevcik. The grid file: An adaptable, symmetric multikey file structure. ACM Trans. Database Syst. , 9(1): 38 -71, 1984. Klaus Hinrichs. Implementation of the grid file: Design concepts and experience. BIT, 25(4): 569 -592, 1985. Sponsored by:
- Slides: 1