One Document at a Time Smallscale Digitization Projects

  • Slides: 26
Download presentation
One Document at a Time: Small-scale Digitization Projects Peter Brueggeman, Scripps Inst. of Oceanography

One Document at a Time: Small-scale Digitization Projects Peter Brueggeman, Scripps Inst. of Oceanography Janet Webster, Hatfield Marine Science Center, OSU Barbara Butler, Oregon Inst. of Marine Biology, UO 33 rd Annual IAMSLIC Conference, Sarasota Florida

Legacy Publication Digitization @ Scripps Peter Brueggeman

Legacy Publication Digitization @ Scripps Peter Brueggeman

Past endeavors Vendor produced PDFs from encoded text: smallest file size; costly; time spent

Past endeavors Vendor produced PDFs from encoded text: smallest file size; costly; time spent on vendor interaction / proofing / revisions Utilizing ILL staff, other staff: lower resolution scanning with routine ILL; quality issues Do It Yourself: better results; least effort

Current Equipment Setup Hewlett Packard Scan. Jet 7800 document scanner: dedicated sheet feeding scanner;

Current Equipment Setup Hewlett Packard Scan. Jet 7800 document scanner: dedicated sheet feeding scanner; double sided scanning Plustek Optic. Book 3600 Corporate flatbed book scanner: six millimeters between scan and edge; good for books with tight bindings Adobe Acrobat: PDF optimization; OCR

Scan specification Scan from disbound trimmed originals Scan from photocopy if no disbound original

Scan specification Scan from disbound trimmed originals Scan from photocopy if no disbound original in order to sheetfeed 600 ppi black/white two-bit scanning for text pages Small file size, better text appearance with b/w scans (not for photos) 600 ppi scan time OK with sheet feeding

300 ppi grayscale vs 600 ppi b/w @ 200%

300 ppi grayscale vs 600 ppi b/w @ 200%

300 ppi vs 600 ppi B/W @ 200% 4 pages: 157 K vs 328

300 ppi vs 600 ppi B/W @ 200% 4 pages: 157 K vs 328 K

Scan Specification 300 ppi grayscale scanning for halftone black and white photographs 300 ppi

Scan Specification 300 ppi grayscale scanning for halftone black and white photographs 300 ppi color scanning for color photographs Large PDF file size accumulates for pages scanned grayscale or color Same 4 page PDF is 1, 150 K @ 300 ppi grayscale, whereas 157 K @ 300 ppi b/w or 328 K @ 600 ppi b/w

Scan Specification For pages comprised partially of a photograph, You may wish to paste

Scan Specification For pages comprised partially of a photograph, You may wish to paste photos scanned grayscale / color onto black/white scanned text pages in order to save some file size while ensuring photo quality

600 ppi black/white scan 300 ppi grayscale scan

600 ppi black/white scan 300 ppi grayscale scan

Scan Specification One page with photo on partial page 600 ppi black/white PDF with

Scan Specification One page with photo on partial page 600 ppi black/white PDF with unacceptable photo = 170 K 300 ppi grayscale PDF with less than acceptable text = 760 K 600 ppi black/white text & 300 ppi grayscale photo PDF = 1, 275 K 600 ppi grayscale PDF = 1, 436 K

Document Production For yellowed/browned original, adjust the lightening setting in the scanning software to

Document Production For yellowed/browned original, adjust the lightening setting in the scanning software to get white pages Adobe Acrobat RECOGNIZE TEXT USING OCR not highly accurate Save final PDF, then save it again via FILESAVE AS to reduce “document overhead” Page through and proof PDF

Document Production Compress via PDF Optimizer if desired Try different settings to judge results

Document Production Compress via PDF Optimizer if desired Try different settings to judge results My target upper file size is 20 megabytes Save original uncompressed version of PDF

Digitization Initiatives at Oregon State University Janet Webster

Digitization Initiatives at Oregon State University Janet Webster

A cog in OSU digitization process Librarian is one player • Identify candidates •

A cog in OSU digitization process Librarian is one player • Identify candidates • Investigate copyright • Send to the Digital Production Unit DPU is the main dealer • • Sliced if possible Scanned & OCRed Rebound, tied or dumped Entered into appropriate digital collection/space All projects/items fit into bigger collection scheme

How it works

How it works

Another twist on how it works.

Another twist on how it works.

Oregon Birds Donated journal from a retired faculty member. Posted to the Cyamus list

Oregon Birds Donated journal from a retired faculty member. Posted to the Cyamus list and was prompted to think about digitizing. Contacted the Oregon Field Ornithologists who were interested. Generated a budget with help from my Technical Services Department chair. Now, are negotiating with OFO.

Considerations I have access to a good digitization unit. I use it. I promote

Considerations I have access to a good digitization unit. I use it. I promote it and thank those involved. I work with others. I couldn’t do it on my own at the branch.

Digitization Initiatives at University of Oregon (OIMB) Barb Butler

Digitization Initiatives at University of Oregon (OIMB) Barb Butler

The OIMB Approach Add to Scholars’ Bank OR Oregon Explorer Shared Collection Development with

The OIMB Approach Add to Scholars’ Bank OR Oregon Explorer Shared Collection Development with OSU Long-term goal: Full-text Coos Bay Bibliography (Oregon South Coast) Geo-spatially referenced (Yaquina Bay Bibliography model) Primary targets in initial phase: Student reports and theses Documents already in digital format

The OIMB Approach (in the beginning) Student assistant Ariel software Flatbed scanner 100 pages

The OIMB Approach (in the beginning) Student assistant Ariel software Flatbed scanner 100 pages per hour Reviewed by staff OCR by Adobe Uploaded

Example 1:

Example 1:

Example 2:

Example 2:

Example 3: 1941 Printing: OCLC: 15 libraries Z 39. 50 Distributed Library • AIMS

Example 3: 1941 Printing: OCLC: 15 libraries Z 39. 50 Distributed Library • AIMS • Hopkins • MBL/WHOI Aquatic Commons: Submitted 10/2007

The OIMB Approach (refined) Same as part two with improvements: Document feeder with duplex

The OIMB Approach (refined) Same as part two with improvements: Document feeder with duplex capability (Epson GT-2500) Native scanner interface or Ariel interface Also inputting into Aquatic Commons Challenges still exist: Lack of dithering option Still scanning at 300 dpi, b/w and grayscale OCR and collating documents