Making Portable Document Format PDF Files Work for

  • Slides: 33
Download presentation
Making Portable Document Format (PDF) Files Work for You Tuomas S. Kostiainen and Jill

Making Portable Document Format (PDF) Files Work for You Tuomas S. Kostiainen and Jill R. Sommer ATA Conference 2009

PDF File Basics l What is a PDF file and why do we use

PDF File Basics l What is a PDF file and why do we use them? – Stands for "Portable Document Format. " PDF is a multiplatform file format developed by Adobe Systems. A PDF file captures document text, fonts, images, and even formatting of documents from a variety of applications. You can e-mail a PDF document to your friend and it will look the same way on his screen as it looks on yours, even if he has a Mac and you have a PC. Since PDFs contain coloraccurate information, they should also print the same way they look on your screen. (Source: Tech. Terms. com)

PDF File Basics l Where do we as translators encounter PDF files? – –

PDF File Basics l Where do we as translators encounter PDF files? – – – l Translation projects: source text, proofreading, reference Forms: registration forms, IRS W-9 Creating PDFs: resume, invoices, file sharing, printing/publishing Problems associated with PDF files – – Rigid (not meant to be editable) Converting from PDF > DOC etc.

Adobe Acrobat l l l Versions: Adobe Reader, Adobe Acrobat Standard, Adobe Acrobat Pro

Adobe Acrobat l l l Versions: Adobe Reader, Adobe Acrobat Standard, Adobe Acrobat Pro Extended Product comparison: www. adobe. com/products/acrobat/matrix. html Compatibility with earlier versions – l Update to the current version of 9 Should be part of your toolbox

Other “Comparable” Products l l l PDF Nitro (Express, Professional, and free version): www.

Other “Comparable” Products l l l PDF Nitro (Express, Professional, and free version): www. nitropdf. com Foxit PDF Tools (Reader, Editor, Createor, Phantom etc. ): www. foxitsoftware. com/pdf/ Solid PDF Tools: www. soliddocuments. com Docu. Com PDF Gold: www. pdfwizard. com Pdf 995 Suite: www. pdf 995. com others

Working with PDF Files 1. 2. 3. 4. 5. Editing and Commenting PDF Files

Working with PDF Files 1. 2. 3. 4. 5. Editing and Commenting PDF Files Searching for Text in PDF Files Creating and Filling Electronic Forms Using Electronic Signatures Creating TMs from PDF Files Using Logi. Term Align. Factory

Editing and Commenting PDF Files l Using Commenting Tools in a normal review cycle

Editing and Commenting PDF Files l Using Commenting Tools in a normal review cycle don’t use only sticky note comments – l available in Reader only if the PDF file author has enabled the document for commenting using Acrobat Pro (Advanced > Extend Features in Adobe Reader) Comment & Markup tools (Tools > Comment & Markup / Tools > Customize Toolbars) – – Text Edits, Highlight Text, Callout, Arrow, Rectangle, etc. Show/Hide Comments List (View > Navigation Panels > Comments) Spell checking of notes (Edit > Check Spelling)

Editing and Commenting PDF Files l Touching up text (changing text and text properties)

Editing and Commenting PDF Files l Touching up text (changing text and text properties) – – l Typewriter tool – l available in Reader only if the PDF author has enabled it Inserting/extracting/rearranging pages – l Tools > Advanced Editing / Advanced Editing toolbar Touch. Up Text, Crop Document > Insert/Extract/Replace/Delete Pages, or use the Pages navigation pane on the left E-mail-based review or shared review (on acrobat. com) for multi-party reviews

Searching for Text in PDF Files l l l Edit > Find: text within

Searching for Text in PDF Files l l l Edit > Find: text within the current document Edit > Search: text in one or more files Indexing (only in Professional): possibility to index hundreds of files for quick searching – – – l Select Advanced > Document Processing > Full Text Index with Catalog > New Index Name the index, select directories to be included, click Build and specify location for the index file Use the resulting. pdx file for searching Creating a Searchable Image – With the image file open in Acrobat, select Document > OCR Text Recognition > Recognize Text Using OCR

Creating and Filling Electronic Forms l l Simple filling with Typewriter tool Using text

Creating and Filling Electronic Forms l l Simple filling with Typewriter tool Using text boxes Converting electronic files to forms using Form Wizard Blueberry PDF Form Filler; FREE application for filling in and printing PDF forms (www. bbconsult. co. uk/Resources/PDFForm. Filler. aspx) – Note: deselect “Lock All Controls” button

Using Electronic Signatures l Inserting a scanned signature – l copy and paste via

Using Electronic Signatures l Inserting a scanned signature – l copy and paste via clipboard (gets inserted as a “stamp”) Creating and using a digital ID – – Creating a digital ID: Advanced > Security Settings > Digital IDs > Add ID > “A new digital ID I want to create now” > Next > New PKCS#12 digital ID file > Next. Fill the information fields, as needed, click Next. Select location for the ID and define a password. Click Finish to return to the Security Settings dialog box. Click Close. Signing a PDF document: Advanced > Sign & Certify > Place Signature. Drag a rectangle where you want to place the signature. Choose a digital ID, type the password, choose appearance and click Sign.

Creating Translation Memories from PDF Files Using Logi. Term Align. Factory l Other tools:

Creating Translation Memories from PDF Files Using Logi. Term Align. Factory l Other tools: – – l You. Align by Logi. Term; online tool, FREE (for a limited time), limited selection of languages (www. youalign. com) No. Babel Auto. Aligner by KCSL; online tool, limited selection of languages (http: //nobabel. com/) Logi. Term Align. Factory. Light (http: //www. terminotix. com) – Quick and easy tool to create TMs from PDF files

Additional PDF-related links l l l www. adobe. com/support/ www. planetpdf. com www. pdfstore.

Additional PDF-related links l l l www. adobe. com/support/ www. planetpdf. com www. pdfstore. com

Part Two: Creating PDFs and OCR

Part Two: Creating PDFs and OCR

Reasons translators might need to create a PDF l l Résumés Invoices Letters of

Reasons translators might need to create a PDF l l Résumés Invoices Letters of certification Various protected files

Creating a PDF from Word or Excel l Choose Print – PDF tool (Acrobat

Creating a PDF from Word or Excel l Choose Print – PDF tool (Acrobat Distiller, win 2 pdf, etc. ) l Select the menu button

Optical Character Recognition l l Optical character recognition (or OCR) is the translation of

Optical Character Recognition l l Optical character recognition (or OCR) is the translation of handwritten, typewritten or printed text to generate a machine-editable text. PDFs can be either straight text or a graphic. OCR can handle both.

Optical Character Recognition l l OCR tools use pattern recognition, artificial intelligence and computer

Optical Character Recognition l l OCR tools use pattern recognition, artificial intelligence and computer vision as well as digital character recognition. The accurate recognition of Latin-script, typewritten text is now considered largely a solved problem on applications where clear imaging is available such as scanning of printed documents. Some tools can now easily recognize Cyrillic and Asian characters as well.

OCR Tools § § § ABBYY Fine. Reader (http: //www. abbyy. com/) PDF Transformer

OCR Tools § § § ABBYY Fine. Reader (http: //www. abbyy. com/) PDF Transformer (by ABBYY) Omni. Page (http: //www. nuance. com/imaging/products/o mnipage. asp) Microsoft Office Document Imaging (part of MS Office 2007) Exper. Vision (http: //www. expervision. com/)

ABBYY Fine. Reader l l l ABBYY is the clear favorite among translators (although

ABBYY Fine. Reader l l l ABBYY is the clear favorite among translators (although PDF Transformer is a close second), because it creates fewer text boxes than other OCR programs The spellcheck feature ensures the document you are working on doesn’t have any spelling errors that would corrupt the TM. ABBYY supports the most languages (184 at last count).

l Abkhaz, Adyghian, Afrikaans, Agul, Albanian, Altai, Armenian (Eastern, Western, Grabar), Avar, Aymara, Azerbaijani

l Abkhaz, Adyghian, Afrikaans, Agul, Albanian, Altai, Armenian (Eastern, Western, Grabar), Avar, Aymara, Azerbaijani (Cyrillic), Azerbaijani (Latin), Bashkir, Basic, Basque, Belarusian, Bemba, Blackfoot, Breton, Bugotu, Bulgarian, Buryat, C/C++, COBOL, Catalan, Cebuano, Chamorro, Chechen, Chinese Simplified, Chinese Traditional, Chukchee, Chuvash, Corsican, Crimean Tatar, Croatian, Crow, Czech, Dakota, Danish, Dargwa, Dungan, Dutch (Netherlands and Belgium), English, Eskimo (Cyrillic), Eskimo (Latin), Esperanto, Estonian, Evenki, Faroese, Fijian, Finnish, Fortran, French,

l Frisian, Friulian, Gagauz, Galician, Ganda, German (Luxemburg), German (new and old spelling), Greek,

l Frisian, Friulian, Gagauz, Galician, Ganda, German (Luxemburg), German (new and old spelling), Greek, Guarani, Hausa, Hawaiian, Hebrew, Hungarian, Icelandic, Ido, Indonesian, Ingush, Interlingua, Irish, Italian, JAVA, Japanese, Jingpo, Kabardian, Kalmyk, Karachay-balkar, Karakalpak, Kasub, Kawa, Kazakh, Khakass, Khanty, Kikuyu, Kirghiz, Kongo, Koryak, Kpelle, Kumyk, Kurdish, Lak, Latin, Latvian, Lezgi, Lithuanian, Luba, Macedonian, Malagasy, Malay, Malinke, Maltese, Mansy, Maori, Maya, Miao, Minangkabau, Mohawk, Moldavian, Mongol, Mordvin, Nahuatl, Nenets, Nivkh, Nogay,

l Norwegian (nynorsk and bokmål), Nyanja, Occidental, Ojibway, Ossetian, Papiamento, Pascal, Polish, Portuguese (Portugal

l Norwegian (nynorsk and bokmål), Nyanja, Occidental, Ojibway, Ossetian, Papiamento, Pascal, Polish, Portuguese (Portugal and Brazil), Provencal, Quechua, Rhaeto-romanic, Romanian, Romany, Rundi, Russian (old spelling), Rwanda, Sami (Lappish), Samoan, Scottish Gaelic, Selkup, Serbian (Cyrillic), Serbian (Latin), Shona, Simple chemical formulas, Slovak, Slovenian, Somali, Sorbian, Sotho, Spanish, Sunda, Swahili, Swazi, Swedish, Tabasaran, Tagalog, Tahitian, Tajik, Tatar, Thai, Tok Pisin, Tongan, Tswana, Tun, Turkish, Turkmen, Tuvinian, Udmurt, Uighur (Cyrillic), Uighur (Latin), Ukrainian,

l Uzbek (Cyrillic), Uzbek (Latin), Welsh, Wolof, Xhosa, Yakut, Zapotec, Zulu

l Uzbek (Cyrillic), Uzbek (Latin), Welsh, Wolof, Xhosa, Yakut, Zapotec, Zulu

Spellchecking l l l ABBYY supports pre- and post-reform German orthography, Old German script,

Spellchecking l l l ABBYY supports pre- and post-reform German orthography, Old German script, scripting languages, and simple chemical formulas. ABBYY can also sometimes replicate graphics and logos that you can paste into your file. The text recognition software includes dictionaries with spell-checking capabilities for 38 languages allowing verification of recognized text directly in the Fine. Reader Editor

Potential Problems l l l Page setups can vary and create inconsistent margins and

Potential Problems l l l Page setups can vary and create inconsistent margins and page layouts. OCR tools have problems with handwriting, bullet lists, check boxes, static from "fuzzy" fax transmissions, and tables. Formatting sometimes needs to be cleaned up (double spaces, text boxes, columns, etc. )

ABBYY Fine. Reader

ABBYY Fine. Reader

Using OCR to create Word files l l l Open image (PDF, TIF, etc.

Using OCR to create Word files l l l Open image (PDF, TIF, etc. ) Read file using OCR tool Check spelling (allows you to verify words that the OCR program misread or did not recognize) Save as Word file Create a clean file and copy and paste the text into it.

Troubleshooting l l l Use Edit->Paste Special to copy the text into a fresh

Troubleshooting l l l Use Edit->Paste Special to copy the text into a fresh Word file and format it by hand. Delete the illegible pages in Adobe and run the legible pages through the tool. OCR isn‘t a magic bullet. If the source text is very illegible you may want to just give up and type it in by hand.

Code. Zapper l l l "Code. Zapper"is a set of Word macros designed to

Code. Zapper l l l "Code. Zapper"is a set of Word macros designed to “clean up” Word files before being imported into a translation environment program such as Deja Vu DVX or Memo. Q. Word documents are often strewn with junk or “rogue” tags (so-called “smart tags”, languagetags, track changes tags, soft hyphenations, scaling and spacing changes, redundant bookmarks, etc. ). This tagged information shows up in the DVX or Memo. Q grid as spurious {1}codes{2}around, or even in the middle of, words, making sentences difficult to read and translate and generally negating many of the productivity benefits of the program.

Code. Zapper l l l OCR‘d files or files converted from PDF are even

Code. Zapper l l l OCR‘d files or files converted from PDF are even worse. Code. Zapper tries to remove as many of these tags as possible while retaining formatting and layout. It also contains a number of other macros which may be useful before and after importing files into DVX or Memo. Q

Final Words l l l Do not use online OCR tools like the Tesseract

Final Words l l l Do not use online OCR tools like the Tesseract OCR Engine from Google if your documents are confidential. Try several demos to determine which tool best suits your needs. Shop around for the best price ($399. 99 vs. EUR 139/GBP 89 or even EUR 90 ($116)).

Links l l Code. Zapper http: //www. transir. cn/redirect. php? tid=992&g oto=lastpost Presentation: http:

Links l l Code. Zapper http: //www. transir. cn/redirect. php? tid=992&g oto=lastpost Presentation: http: //translationmusings. com/2009/11/05/pre sentation-from-ata-conference/