Session 503 Book to Computer Scanning Basics Gaeir

Session 503 Book to Computer: Scanning Basics Gaeir Dietrich Director High Tech Center Training Unit of the California Community Colleges

Overview Scanning and scanners n Understanding scanning terminology n Scanning workflow n 12/13/2021 CTEBVI Conference 2

Scanning n Scanning takes a picture. – The better the picture, the less editing later on n Similar technology to the copy machine but outputs to a digital file, not paper. 12/13/2021 CTEBVI Conference 3

Stand Alone vs. Multi-use n Stand alone scanners… – Provide more control over scans – Result in better scans n Multi-use machines are copiers first, scanners second. – Final products require more editing during production – But it is still better than a flatbed scanner 12/13/2021 CTEBVI Conference 4

Scanners n When buying a scanner think about these issues: – Duplex (two-sides at once) – Automatic feed (pages per minute) – Color (for color dropout) n We like Canon, followed by Fujitsu. – Canon DRC-125, DR-3010 C 12/13/2021 CTEBVI Conference 5

No Money? n A $400 20 -page-per-minute scanner is a far better deal than four $100 flatbed scanners n If you can only afford a flat-bed, look for one with automatic document feed (ADF) 12/13/2021 CTEBVI Conference 6

Scanning Outputs n Color scanning usually creates a JPEG. – JPEGs are single pages only!! n Black and white scanning creates a TIFF. – TIFFs can be multiple pages. 12/13/2021 CTEBVI Conference 7

What is a TIFF? n TIFF files are graphics, i. e. , pictures of text. – Tagged Image File Format (TIFF) n Robust, stable standard file type – No version issues – Any program that can open multipage graphics can open a TIFF – Good archival graphical format 12/13/2021 CTEBVI Conference 8

But I scan to… l If you get anything other than a TIFF or JPEG, you have used software to convert. n If you scan to PDF, you have used software to transform your file. – Scanning hardware does not create PDFs. – Conversion runs the risk of losing data and increasing editing time. 12/13/2021 CTEBVI Conference 9

Converting TIFFs n TIFF can be converted to other formats, including other graphic formats like PDF. n To get to the text you must run a TIFF file through an optical character recognition (OCR) program. 12/13/2021 CTEBVI Conference 10

Scanning Is the First Step n Settings for your scan will be determined by the end-format you want to create n For text, you will scan then run OCR – Optical Character Recognition – See session 901 on Sunday 12/13/2021 CTEBVI Conference 11

Scanning Terms n n n Duplex vs. simplex Skew/deskew Margin control DPI (Resolution) Mode 12/13/2021 n n n Brightness Contrast Threshold RGB color Color dropout CTEBVI Conference 12

Duplex vs. Simplex Double-sided vs. single-sided n Duplex = two sides at a time (one pass) n Simplex = one side at a time n – Flatbed scanners are simplex scanners n Look for true duplex (one pass) – Not two passes with the program interleaving the scans 12/13/2021 CTEBVI Conference 13

Skew n Skew is slant – i. e. , the page is not straight Snug the feed guides! n Use deskew settings. n – The computer can correct for some skew —too much and the text cannot be recognized 12/13/2021 CTEBVI Conference 14

Margin Control n Scanner determines page size – Avoids large black areas around the edge of the page – On better machine, also removes need for measuring n Better scanners will also have margin adjustment – Note that usually *all* edges are adjusted the same amount. 12/13/2021 CTEBVI Conference 15

DPI (Dots per Inch) n “Dots” in scanning are really pixels – Little squares like on graph paper Imagine drawing by filling in squares on graph paper…the more squares, the smoother the lines n Higher DPI = better resolution n However, more is not always better! n 12/13/2021 CTEBVI Conference 16

DPI Comparison 12/13/2021 CTEBVI Conference 17

Resolution—DPI Standard for text is 300 DPI n Small text may require 400 DPI n Thin paper may require 150 -200 DPI n Really large text may require 200 DPI n n Infty Reader for math requires 600 DPI 12/13/2021 CTEBVI Conference 18

Mode n Black & white – Looks like line art – Only choices for pixels are black or white n Grayscale – Looks like black & white photo – Also called “halftone” n Color – Comes in different “bits” – The more bits, the more color information 12/13/2021 CTEBVI Conference 19

Black and White n Image scanned in B/W—file size 474 KB 12/13/2021 CTEBVI Conference 20

Black and White ED n Image scanned in B&W ED (Canon DR 5080 C)— file size 474 KB 12/13/2021 CTEBVI Conference 21

Grayscale n Image scanned in Grayscale— file size 3, 731 KB 12/13/2021 CTEBVI Conference 22

Choosing the Mode n Black and white – Best for text; smallest file size n Black and white ED (error diffusion) – Better for graphics; slightly larger files n Usually best to avoid grayscale – Large files that do not OCR as well n Color – Sometimes necessary; large files 12/13/2021 CTEBVI Conference 23

Which Mode to Choose? It depends on how important the graphics are! n Is it for a student who has some usable vision and needs enlargement? n – Grayscale or color may be needed n Is it to create braille? – Black and white will usually give the best OCR results. 12/13/2021 CTEBVI Conference 24

Brightness Overall darkness or lightness of page n Balance n – Not too dark, not too light n Scale 1 -255 – Lower numbers decrease brightness n Down into darkness – Higher numbers increase brightness n Up 12/13/2021 to the light CTEBVI Conference 25

Brightness Example n It’s just like turning on lights over an entire room. 12/13/2021 CTEBVI Conference 26

Adjusting Brightness Default is 128 n Too dark n – Letter shapes run together n Too light – Letter shapes are thin or broken n Newsprint type papers often need increased brightness 12/13/2021 CTEBVI Conference 27

Brightness Guidelines n Check the appearance of the scan If characters are thick and touching (running together) > increase brightness n If characters are thin and broken (lines thin/missing areas) > reduce brightness n 12/13/2021 CTEBVI Conference 28

Sample Scans n Too bright n Just right n Too dark 12/13/2021 CTEBVI Conference 29

Contrast Difference between light and dark on page n Scale is 1 -13 n Higher number increases contrast n – Darks darker, lights lighter n Lower number decreases contrast – Darks get lighter, lights get darker – Becomes more uniform 12/13/2021 CTEBVI Conference 30

Contrast Example 12/13/2021 CTEBVI Conference 31

Adjusting Contrast n n Default is 7 Low contrast – Entire page is either “muddy” looking – Or washed-out looking n High contrast – Extremes of light and dark – May lose midrange detail n Newsprint-type paper oftens need increased brightness 12/13/2021 CTEBVI Conference 32

Threshold n In black and white mode – Sometimes just see brightness (contrast settings disappear) Sets where gray will be seen n Increased threshold adds more white n – More grays seen as white n Decreased threshold adds more black – More grays seen as black 12/13/2021 CTEBVI Conference 33

Despeckle “Erases” speckles n Helps with small stray black dots n Works really well when having to scan a photocopy or newsprint n Beware of going too far and erasing periods and umlauts n 12/13/2021 CTEBVI Conference 34

Gamma…it’s complicated… Adjusts the middle tones n Usually more useful for scanning graphics than text n – Can be altered to bring out more detail in shadows in photos Usually on high-end hardware n Try everything else first! n 12/13/2021 CTEBVI Conference 35

Settings Summary Brightness = overall tone n Contrast = difference in highs and lows n Gamma = adjustment in midtones n n Threshold = on or off switch for grays – Grays seen as white or black – May appear as just the “brightness” bar 12/13/2021 CTEBVI Conference 36

RGB Color RGB = Red, Green, Blue n RGB color system is used by TVs, computers, and scanners! n 12/13/2021 CTEBVI Conference 37

“Additive” Color System 12/13/2021 CTEBVI Conference 38

Color Scanners Many color scanners for documents allow “color dropout” n The scanner “ignores” a particular color n “Erases” the color n – Red, blue, or green 12/13/2021 CTEBVI Conference 39

Color Dropout n Drop out colored markings – Orange highlighter (drop out red) – Blue pen (drop out blue and despeckle) n Yellowish pages – Drop out red (improves contrast) Tinted backgrounds n Watch out for dropping out text n – Be aware of color with white text on it 12/13/2021 CTEBVI Conference 40

Scanned Page with Orange Highlighter 12/13/2021 CTEBVI Conference 41

Same Page with Red Drop -out 12/13/2021 CTEBVI Conference 42

Scanning Workflow Remove spine from book n Separate any pages still glued together n Choose a few representative pages for a test scan n 12/13/2021 CTEBVI Conference 43

Procedure Continued n Scan representative pages to TIFF – Check image on screen for possible adjustments n Run OCR on sample pages – Error rate should be no higher than one per page – Higher errors mean you need to adjust the scanner settings 12/13/2021 CTEBVI Conference 44

Ready to Scan n With the settings determined, scan the entire book n Now that you have a good picture, your OCR and editing should go quickly! 12/13/2021 CTEBVI Conference 45

Advanced Ideas n Be aware of individual pages that may need additional adjustment – A few pages may need to be scanned separately – A few pages may need color – Reassemble in your OCR program n While checking test pages, also create OCR templates as appropriate 12/13/2021 CTEBVI Conference 46

Suggestion on Organizing Files Structure n Label chapters (or chapter folders): – 01 Chapter – 02 Chapter n Label front matter to place it first: – 00 Front Matter n Label back matter just with its name: – Back Matter n This file structure will create a logical order. 12/13/2021 CTEBVI Conference 47

Example 12/13/2021 CTEBVI Conference 48

Timesaver: Create a Template Folder n n The template folder can be copied and pasted—all the inside folders are copied, as well! Putting the zero in front makes the folder easy to find. 12/13/2021 CTEBVI Conference 49

Miscellaneous Tips n Chopping books – Guillotine – Exacto knife to remove spine and check with Fed Ex Office (Kinko’s) about cutting the pages n Spines and flatbeds – If you have to scan a book with a thick spine on a flatbed, get a large dark piece of cloth and cover the scanner—prevents the darkened area along the spine 12/13/2021 CTEBVI Conference 50

What Do I Scan With? n First try the software that came with your scanner – Often optimized to take advantage of all your scanner’s features n With flatbed scanners, sometimes the software is not the best – Can scan with OCR programs and some graphics programs (e. g. , Photoshop) 12/13/2021 CTEBVI Conference 51

High Speed Scanning and Static n Dryer sheets hung over paper path – Make sure to attach them well—don’t scan them! ; -) n Add humidity – Run humidifier – Mist paper lightly with spray bottle 12/13/2021 CTEBVI Conference 52

Keep It Clean! n Clean your scanner often! n Bits of glue and paper fall into scanner – Clean glass n Print builds up on rollers – Clean rollers and feed guides 12/13/2021 CTEBVI Conference 53

And the most important tip… Play with your scanner!!! n Try big changes in the settings n – Push brightness and contrast to the edges and see what happens! n Compare and contrast – Try one page in B&W, grayscale, and color – Try thin paper, glossy paper, newsprint 12/13/2021 CTEBVI Conference 54

Mammals Play to Learn Take time to learn your scanner n Learning to enhance scanner settings will result in better “pictures” and less editing! n Taking the time to create good scans saves time in the long run. n 12/13/2021 CTEBVI Conference 55

Happy scanning! Gaeir (rhymes with “fire”) Dietrich n gdietrich@htctu. net n 408 -996 -6047 n www. htctu. net n 12/13/2021 CTEBVI Conference 56
- Slides: 56