One Document at a Time Smallscale Digitization Projects
- Slides: 26
One Document at a Time: Small-scale Digitization Projects Peter Brueggeman, Scripps Inst. of Oceanography Janet Webster, Hatfield Marine Science Center, OSU Barbara Butler, Oregon Inst. of Marine Biology, UO 33 rd Annual IAMSLIC Conference, Sarasota Florida
Legacy Publication Digitization @ Scripps Peter Brueggeman
Past endeavors Vendor produced PDFs from encoded text: smallest file size; costly; time spent on vendor interaction / proofing / revisions Utilizing ILL staff, other staff: lower resolution scanning with routine ILL; quality issues Do It Yourself: better results; least effort
Current Equipment Setup Hewlett Packard Scan. Jet 7800 document scanner: dedicated sheet feeding scanner; double sided scanning Plustek Optic. Book 3600 Corporate flatbed book scanner: six millimeters between scan and edge; good for books with tight bindings Adobe Acrobat: PDF optimization; OCR
Scan specification Scan from disbound trimmed originals Scan from photocopy if no disbound original in order to sheetfeed 600 ppi black/white two-bit scanning for text pages Small file size, better text appearance with b/w scans (not for photos) 600 ppi scan time OK with sheet feeding
300 ppi grayscale vs 600 ppi b/w @ 200%
300 ppi vs 600 ppi B/W @ 200% 4 pages: 157 K vs 328 K
Scan Specification 300 ppi grayscale scanning for halftone black and white photographs 300 ppi color scanning for color photographs Large PDF file size accumulates for pages scanned grayscale or color Same 4 page PDF is 1, 150 K @ 300 ppi grayscale, whereas 157 K @ 300 ppi b/w or 328 K @ 600 ppi b/w
Scan Specification For pages comprised partially of a photograph, You may wish to paste photos scanned grayscale / color onto black/white scanned text pages in order to save some file size while ensuring photo quality
600 ppi black/white scan 300 ppi grayscale scan
Scan Specification One page with photo on partial page 600 ppi black/white PDF with unacceptable photo = 170 K 300 ppi grayscale PDF with less than acceptable text = 760 K 600 ppi black/white text & 300 ppi grayscale photo PDF = 1, 275 K 600 ppi grayscale PDF = 1, 436 K
Document Production For yellowed/browned original, adjust the lightening setting in the scanning software to get white pages Adobe Acrobat RECOGNIZE TEXT USING OCR not highly accurate Save final PDF, then save it again via FILESAVE AS to reduce “document overhead” Page through and proof PDF
Document Production Compress via PDF Optimizer if desired Try different settings to judge results My target upper file size is 20 megabytes Save original uncompressed version of PDF
Digitization Initiatives at Oregon State University Janet Webster
A cog in OSU digitization process Librarian is one player • Identify candidates • Investigate copyright • Send to the Digital Production Unit DPU is the main dealer • • Sliced if possible Scanned & OCRed Rebound, tied or dumped Entered into appropriate digital collection/space All projects/items fit into bigger collection scheme
How it works
Another twist on how it works.
Oregon Birds Donated journal from a retired faculty member. Posted to the Cyamus list and was prompted to think about digitizing. Contacted the Oregon Field Ornithologists who were interested. Generated a budget with help from my Technical Services Department chair. Now, are negotiating with OFO.
Considerations I have access to a good digitization unit. I use it. I promote it and thank those involved. I work with others. I couldn’t do it on my own at the branch.
Digitization Initiatives at University of Oregon (OIMB) Barb Butler
The OIMB Approach Add to Scholars’ Bank OR Oregon Explorer Shared Collection Development with OSU Long-term goal: Full-text Coos Bay Bibliography (Oregon South Coast) Geo-spatially referenced (Yaquina Bay Bibliography model) Primary targets in initial phase: Student reports and theses Documents already in digital format
The OIMB Approach (in the beginning) Student assistant Ariel software Flatbed scanner 100 pages per hour Reviewed by staff OCR by Adobe Uploaded
Example 1:
Example 2:
Example 3: 1941 Printing: OCLC: 15 libraries Z 39. 50 Distributed Library • AIMS • Hopkins • MBL/WHOI Aquatic Commons: Submitted 10/2007
The OIMB Approach (refined) Same as part two with improvements: Document feeder with duplex capability (Epson GT-2500) Native scanner interface or Ariel interface Also inputting into Aquatic Commons Challenges still exist: Lack of dithering option Still scanning at 300 dpi, b/w and grayscale OCR and collating documents
- Digitization workflow
- Voice digitization
- Digitization of sound
- Speech digitization and generation in hci
- Global trade digitization
- Introduction to digitization
- Cp 4k786
- Afres full form in railway
- Digitization workflow diagram
- Digitization in indiana
- Digitization
- Image acquisition and digitization
- Ddo template guidelines
- Document.setcookie
- Pert formula
- Scheduling resources and costs
- One empire one god one emperor
- One one little dog run
- One king one law one faith
- One god one empire one emperor
- One ford behaviors
- See one do one teach one
- See one, do one, teach one
- Night structure
- Studiendekanat uni bonn
- Asean tourism strategic plan
- Asean one vision one identity one community