Alexandria Digital Library Project Goals and Challenges in
Alexandria Digital Library Project Goals and Challenges in Georeferenced Digital Libraries Greg Janée
Alexandria Digital Library Project Goals o Digital library: § “an integrated set of services for capturing, cataloging, storing, searching, protecting, and retrieving information” o ADL: § a lightweight, distributed digital library for heterogeneous, georeferenced information § a system and an infrastructure – supports personal collections. . . institutions – provides interoperability across spatial data providers 2
Alexandria Digital Library Project Adjectives o Heterogeneous § remotely-sensed imagery; textual documents § multimedia instructional materials; executable models § gazetteer placenames o Georeferenced § generalizes to “scientific data”: any highly-structured, metadatarich information o Distributed § for scalability o Lightweight § accommodate small, cheap (i. e. , free) implementations § include non-traditional spatial data sources 3
Alexandria Digital Library Project Where we are today o Downloadable server software, two clients o In operational use by MIL o Other (potential) users: § § § Bren/ESSW Scripps DLESE Norwegian National Library Auckland University of Technology 4
Alexandria Digital Library Project Challenges o Discovery Gazetteers Ranking Scalability Context Client integration o More at http: //www. dlib. org/ o o o 5
Alexandria Digital Library Project Challenge 1: discovery o Can’t beat word search when it works § I want a map of Boulder § “Downtown street map of Boulder, Colorado” o But there are so many names for a place. . . § § § Boulder, Arapahoe County, Colorado Chautauqua, Mapleton Hill, Pearl Street Mall Area code 303, ZIP code 80305, UTM grid 13 S Flatirons, Rocky Mountains, Front Range Landers earthquake, hurricane Hugo 6
Alexandria Digital Library Project If you’re still not convinced. . . o Remote-sensing imagery is nameless § “AVHRR NOAA-13 2002 -06 -03 14: 33 UTC” o Challenge: exactly which two words will find a USGS map of the Flatirons in the Rocky Mountains behind Boulder, Arapahoe County, Colorado? Eldorado Springs 7
Alexandria Digital Library Project ADL approach o Coordinate-based representation and discovery § generic lat/lon coordinates § rich geometry client placenames – polygons, polylines § spatial operators gazetteer – overlaps, contains o Gazetteer § defines representation of places § maps placenames coordinates library 8
Alexandria Digital Library Project Challenge 2: gazetteers: necessary evil o Few (public) sources of gazetteer data o Lousy quality § digitized from maps o Difficult problems § § o conflation classification boundary determination change over time Conclusion § gazetteer-based spatial reasoning seems unlikely § interaction will likely remain client-centric 9
Alexandria Digital Library Project Final thoughts on discovery o Coordinate-based approach is costly § § o burden on users and catalogers limits potential collections relies on gazetteer’s weakest aspect: footprints continuous coordinate space adds complexity Gazetteer improvements § federated gazetteers § new gazetteer models: topological as opposed to metric o Other coordinate spaces, grids, etc. 10
Alexandria Digital Library Project Challenge 3: ranking o Observed phenomenon: § World Map is first result of every query o Idea: rank by spatial similarity to query region 4 2 1 3 query 11
Alexandria Digital Library Project Challenge 4: scalability o Easy to accumulate lots of data § satellites image continuously § 1 m resolution, Earth’s surface area = 5 1014 m 2 o Support for scalability § text: amazingly good § spatial: not so good – indexing becomes unwieldy at 106 items § combining spatial with other constraint types is difficult 12
Alexandria Digital Library Project ADL approach o Partition and distribute the problem o Multiple levels of discovery § find relevant collections § search just those collections o Support multiple implementation strategies § spatial engine § relational database § home-grown 13
Alexandria Digital Library Project Challenge 5: context o o Context is critical for evaluation Textual context: poem software 14
Alexandria Digital Library Project Geospatial context o Does this answer your question? Flagstaff Rd. Flatirons 1 -5 Green Mountain 15
Alexandria Digital Library Project Challenge 6: client integration o “Click here” approach places large burden on users § § o navigate interpret evaluate download Service-based access will become predominant § just as the WWW replaced FTP o Needed: § description/access standards, protocols § integration with search constraints 16
- Slides: 16