Media Text Words and symbols in any form
Media: Text “Words and symbols in any form, spoken or written, are the most common system of communication. ” ~ unknown
Text - Representation ● ASCII – – – ● 7 -bit code 128 values in ASCII character set (English Alphabet) use of 8 th bit in text editors/word processors creates incompatibility ISO character sets – – extended ASCII to support non-English text (symbols such as ¢ or œ ) ISO Latin provides support for accented characters ● – ● à, ö, ø, etc. ISO sets include Chinese, Japanese, Korean & Arabic UNICODE – – – 16 bit format (Roman vs. Western European or Kanji – Japan) 65, 000 different symbols 25 supported scripts of Version 2. 0 Unicode Standard: Arabic, Armenian, Bengali, Bopomofo, Cyrilic, Devanagari, Georgian, Greek, Gujarati, Gurmkhi, Hangul, Hebrew, Hiragana, Kannada, Katakana, Latin, Lao, Malayalam, Oriya, Phonetic, Tamil, Telugu, Thai, Tibetan
ASCII ● ● ● All uppercase and lowercase letters Punctuation symbols like !. , ? : ; “ ‘ etc. Digits 0, …, 9 Arithmetic symbols + = - / < > Assorted special symbols like # @ $ % ^ & * ( ) { } [ ] etc. Invisible formatting characters
ASCII
Text - Representation – Marked-up text ● ● ● – Structured Text ● ● – nroff, troff La. TEX SGML – HTML – Hy. Time – XML, XSL, XLL structure of text represented in data structure, usually tree-based ODA, structure embedded in byte-stream with content Hypertext ● ● ● non-linear graph or “web” structure : nodes and links currently subject of intensive ISO standards activity
Text - Operations ● Character operations – – ● String operations – – – ● basic data type with assigned value permits direct character comparison (a<b) comparison concatenation substring extraction and manipulation Editing – – – perhaps the most familiar set of operations on text cut/copy/paste strings v. blocks, dependent on document structure
Text - Operations ● Formatting – – interactive or non-interactive (WYSIWYG v. La. TEX) formatted output ● ● – font management ● ● bitmap page description language (Postscript, PDF) typeface point size (1 point = 1/72 of an inch) True. Type fonts : geometric description + kerning Pattern-matching and Searching – – search and replace wildcards regular expressions for large bodies of text, or text databases, use of inverted indices, hashing techniques and clustering.
Text - Operations ● Sorting – – ● numerous varieties of sort, all of them extensively studied in basic programming sort complexity is a major factor in data handling performance Compression – – – ASCII uses 7 bits per character, though most word-processors actually use the 8 th bit to use up a byte per character Information theory estimates 1 -2 bits per character to be sufficient for natural language text This redundancy can be removed by encoding : ● ● ● Huffman : varies the numbers of bits used to represent characters, shortest codes for highest frequency characters Lempel-Ziv : identifies repeating strings and replaces them by pointers to a table Both techniques compress English text at a ratio of between 2: 1 and 3: 1
Text - Operations ● Encryption – – text encryption is widely used in electronic mail and networked information systems most widely-used techniques : ● ● ● – subject of major controversy : ● ● DES RSA public-key PGP key escrow systems Clipper chip “strong” encryption now being legally outlawed in a number of countries Language-specific operations – – – spell-checking parsing and grammar checking style analysis
About Fonts and Faces ● ● A typeface – family of graphic character (include many type sizes & styles) A font is a collection of characters of a single size Styles are boldface and italic (underlining & outlining) Serif vs. Sans Serif (‘sans’(French) – without)
- Slides: 10