Digitization of Visual Resources Jenn Riley Metadata Librarian
Digitization of Visual Resources Jenn Riley Metadata Librarian IU Digital Library Program October 21, 2004 L 597: Humanities Computing
Technical overview n n n Analog to digital conversion Resolution Bit depth Color representation Reflectivity and polarity Compression October 21, 2004 L 597: Humanities Computing
Analog to digital conversion n Image is converted to a series of pixels laid out in a grid Each pixel has a specific color, represented by a sequence of 1 s and 0 s Pixel-based images are called “raster” images or “bitmaps” October 21, 2004 L 597: Humanities Computing
Resolution (1) n n Often referred to as “dpi” or “ppi” RATIO of number of pixels captured per inch of original photo size q q 8 x 10 print scanned at 300 ppi = 2400 x 3000 pixels 35 mm slide (24 x 36 mm!) scanned at 300 ppi ≈ 212 x 318 pixels October 21, 2004 L 597: Humanities Computing
Resolution (2) n n “Spatial resolution” refers to pixel dimensions of image, e. g. , 3000 x 2400 pixels Flatbed and film scanners have a fixed focus, so they know how big the original is; digital cameras don’t October 21, 2004 L 597: Humanities Computing
Bit depth (1) n n n Refers to number of bits (binary digits, places for zeroes and ones) devoted to storing color information about each pixel 1 bit (1) = 21 = 2 shades (“bitonal”) 2 bit (01) = 22 = 4 shades 4 bit (0010) = 24 = 16 shades 8 bit (11010001) = 28 = 256 shades (“grayscale”) October 21, 2004 L 597: Humanities Computing
Bit depth (2) 1 bit (black & white) 4 bit (16 colors) October 21, 2004 2 bit (4 colors) 8 bit (256 colors) L 597: Humanities Computing
Color representation n RGB q q q n Scanners generally have sensors for Red, Green, and Blue Each of these “channels” is stored separately in the digital file 8 bits for each of 3 channels = 24 bit color CMYK (Cyan, Magenta, Yellow and Black) is used for high-end “pre-press” printing purposes October 21, 2004 L 597: Humanities Computing
Reflectivity and polarity Positive Reflective Paper Photographic prints Transmissive Slide film October 21, 2004 L 597: Humanities Computing Negative film
Compression Makes files smaller for storage n Files must be decompressed for viewing n Lossless n Lossy n q “visually lossless” October 21, 2004 L 597: Humanities Computing
Technical questions? n n n Analog to digital conversion Resolution Bit depth Color representation Reflectivity and polarity Compression October 21, 2004 L 597: Humanities Computing
Setting specifications n n n n Standards & best practices Capture once, use many Determine purpose Resolution Bit depth & color File formats Quality control October 21, 2004 L 597: Humanities Computing
Standards & best practices n n NARA Technical Guidelines for Digitizing Archival Materials for Electronic Access Moving Theory into Practice q q n n Book Online tutorial California Digital Library Digital Image Format Standards Western States Digital Imaging Best Practices October 21, 2004 L 597: Humanities Computing
Capture once, use many n Create master image when scanning q q q n Capture all “important” information Meets all foreseeable needs For long-term storage and later use Create derivatives for specific uses later q q q Web delivery Printing Publication October 21, 2004 L 597: Humanities Computing
Determine purpose n Nature of materials q q n Artifactual For content only Capture all important information q q But what’s “important”? Not always “what people can see” October 21, 2004 L 597: Humanities Computing
Resolution (1) n n Higher is not always better Scan at highest resolution necessary to achieve your stated purpose, no higher chart from Cornell’s online digital imaging tutorial: <http: //www. library. cornell. edu/preservation/tutorial/conversion-03. html> October 21, 2004 L 597: Humanities Computing
Resolution (2) n Color photographic materials q q n 3000 -6000 pixels on the long side 24 -bit RGB B/W photographic materials q q 3000 -6000 pixels on the long side 8 -bit grayscale (unless sepia tone is “important”) October 21, 2004 L 597: Humanities Computing
Resolution comparison (1) October 21, 2004 L 597: Humanities Computing
Resolution comparison (2) 600 dpi October 21, 2004 300 dpi L 597: Humanities Computing
Bit depth & color n n Match current photo or match original scene Final master images should be 8 bits per channel (8 -bit grayscale, 24 -bit RGB); some specialized projects using higher bit depths Any color adjustments & other processing should be done in scanning software before final scan is done Use almost the full tonal range; avoid “clipping” October 21, 2004 L 597: Humanities Computing
Master file formats n TIFF (uncompressed) q q n Virtually unanimously recommended by digital imaging best practices “De facto” standard JPEG 2000 q q q ISO/IEC IS 15444 -1 | ITU-T T. 800 Not patent-free Up-and-coming but not quite there yet Supports embedded metadata Uses wavelet-based compression October 21, 2004 L 597: Humanities Computing
Why not JPEG? n Lossy-compressed every time they are saved low compression, high quality October 21, 2004 high compression, low quality L 597: Humanities Computing
Delivery file formats n n n Photographic materials: JPEG Text, line drawings: GIF PNG? October 21, 2004 L 597: Humanities Computing
Quality control n n Essential part of every digitization project Objective criteria q q n Can be automated Can check all items Subjective criteria q q Require human checks Must sample October 21, 2004 L 597: Humanities Computing
Specifications questions? n n n n Standards & best practices Capture once, use many Determine purpose Resolution Bit depth & color File formats Quality control October 21, 2004 L 597: Humanities Computing
More information n n NARA Technical Guidelines for Digitizing Archival Materials for Electronic Access Moving Theory into Practice q q n Book Online tutorial IU DLP Use of Digital Imaging Standards & Best Practices October 21, 2004 L 597: Humanities Computing
Visual Resources Metadata in Libraries Jenn Riley Metadata Librarian IU Digital Library Program October 21, 2004 L 597: Humanities Computing
What is metadata? “Data about data” October 21, 2004 L 597: Humanities Computing
A better definition n Other characteristics q q n Origin q q n Structure Control Machine-generated Human-generated In practice, often grows to cover data and meta-metadata October 21, 2004 L 597: Humanities Computing
Levels of control n n Data structure standards Data content standards October 21, 2004 L 597: Humanities Computing
Creating metadata n n n HTML <meta> tags Spreadsheets Databases XML Digital library content management systems q q Content. DM Greenstone October 21, 2004 L 597: Humanities Computing
Types of metadata n n Descriptive metadata Administrative metadata q q q n Technical metadata Preservation metadata Rights metadata Structural metadata October 21, 2004 L 597: Humanities Computing
Purposes of descriptive metadata n n Description Discovery October 21, 2004 L 597: Humanities Computing
General descriptive metadata standards n Dublin Core [example] q q n n n Unqualified Qualified MARC/AACR 2 [example] MARCXML MODS [example] October 21, 2004 L 597: Humanities Computing
Visual resources metadata standards n n n CDWA [example] VRA Core [example] CCO October 21, 2004 L 597: Humanities Computing
Technical metadata n n n For recording technical aspects of digital objects Of use for long-term maintenance of data Standards for still images q q NISO Z 39. 87: Data Dictionary – Technical Metadata for Digital Still Images MIX October 21, 2004 L 597: Humanities Computing
Structural metadata n For creating a logical structure between digital objects q q n Multiple copies of same bibliographic item Multiple pages within item Grouping of pages into sections Multiple sizes of each page METS is the current primary schema October 21, 2004 L 597: Humanities Computing
METS n n n mets. Hdr dmd. Sec amd. Sec q q n n tech. MD rights. MD source. MD digiprov. MD file. Sec struct. Map struct. Link behavior. Sec October 21, 2004 L 597: Humanities Computing
Some other specialized metadata formats n n n TEI EAD [example] GILS CSDGM [example] GEM CIDOC October 21, 2004 L 597: Humanities Computing
Vocabularies n n n n n TGM II TGN Geo. Net AAT LCSH LCNAF DCMI Type MIME Types October 21, 2004 L 597: Humanities Computing
Other considerations n n n Standard formatting Repeatability of elements Describing original vs. digitized item Relationships between records Interoperability October 21, 2004 L 597: Humanities Computing
Crosswalks n n n For transforming between metadata formats Usually refers to transforming between content standards rather than structure standards, but not always Good practice to create and store most robust metadata format possible, then create other views for specific needs October 21, 2004 L 597: Humanities Computing
The bottom line n n Many concepts from a century and a half of library cataloging inform good metadata practices But some re-examination needed for new environment Need for automation Need for smart people contributing! October 21, 2004 L 597: Humanities Computing
More information n n Individual schema documentation Caplan: Metadata Fundamentals for all Librarians, 2003 NISO Press: Understanding Metadata IFLA Functional Requirements for Bibliographic Records October 21, 2004 L 597: Humanities Computing
Thank you! n n jenlrile@indiana. edu These presentation slides: <http: //www. dlib. indiana. edu/~jenlrile/presentations/slis/04 fall/l 597 walsh/L 597. ppt> October 21, 2004 L 597: Humanities Computing
- Slides: 45